There are two versions of this script, the first is for running each simulation to the end and then saving the final step as the output of the model and the second is to save outputs along the way so we can evaluate how different models change through time.

Here is the first, and primary, version:



#####################################################################

# Run the full model in a cluster. This version writes files to a cluster output folder.
# rm(list = ls())
# install.packages("~/Desktop/FARM_1.0.tar.gz", repos=NULL, type="source")


#####################################################################
## need to document which functions we use from each of these libraries. 
library(ape)
library(spdep)
library(Rcpp)
library(msm)
library(FARM)


sim_run_cluster <- function(replicate_cycle, myWorld, number_of_time_steps, nbs,
                            number_of_tips, number_of_neighbors, origins, start = NULL) {
  # Calls the full simulation script 
  #  
  # Purpose: Need to wrap the entire simulation script into a function so it can be called in parallel from a cluster call  
  #
  # Args:
  #    replicate_cycle: An integer indicating the replicate number of a simulation. This variable is used in this function to label        
  #         the saved output file and control the number of replicates run by the cluster.
  #
  #    combo_number: An interger between 1 and 31 indicating the combinations of S, E, A, D, and T modules to be included 
  #         in the simulation. The full list of these combinations can be printed using the function combo_of_choice(28, TRUE).
  #         We are currently using combinations 25,28,29,and 31 as our four competing models for the spread of agriculture.  
  #
  #    myWorld: Matrix that defines the scope of the available world and acts as a data hub for organizing and reporting      
  #         results from the different elements of the simulation. 
  #
  #    number_of_time_steps: An integer indicating how many iterations the simulation will calculated before writing the data 
  #         file. 
  #
  #    nbs: A list of the available neighbors for each spatial point. This is passed to the function for calculating the interaction 
  #         of neighbors through time. 
  #
  #    number_of_tips: An interger indicating the number of tree tips the simulation should be truncated to. The default is to 
  #         include all the available tips (e.g. 1254 for human languages). 
  #
  # Returns: 
  #    myOut: A list object containing a 'phylo' tree object called mytree in the first position and the myWorld matrix of 
  #         spatial and tree data in the second position 
  #     
  

  x1 <- 4 #Number of runs per core
  sampleer <- sample(c(1,2,5,6), x1)
  #if (replicate_cycle != 1) {
  #  replicate_cycle <- ((replicate_cycle - 1) * x1) + 1
 # }
 # replicate_cycle <- replicate_cycle:(replicate_cycle + (x1 - 1))
  for (count in sampleer) {
  independent <- 1 

    
    # Probability of Arisal
    prob_choose_a <- rev(sort(rexp(4, rate = 9)))
    prob_choose_a <- prob_choose_a[c(sample(1:2, 2), sample(3:4, 2))]
    prob_choose_a[3] <- 0
    P.Arisal0  <- parameters(prob_choose_a[1], prob_choose_a[4],
                             prob_choose_a[3], prob_choose_a[2],
                             "Env_NonD", "Env_D",
                             "Evol_to_F", "Evol_to_D")
    # P.Arisal0 is the one you should change the parameters
    P.Arisal <- matrix(NA, ncol = 2, nrow = nrow(myWorld)) # probability per cell
    colnames(P.Arisal) <- c("Evolve_to_F", "Evolve_to_D")
    Env.Dom <- myWorld[, 7] == 2
    P.Arisal[Env.Dom, 1] <- P.Arisal0[1, 2]
    P.Arisal[!Env.Dom, 1] <- P.Arisal0[1, 1]
    P.Arisal[Env.Dom, 2] <- P.Arisal0[2, 2]
    P.Arisal[!Env.Dom, 2] <- P.Arisal0[2, 1]
    
    colnames(P.Arisal) <- c("Prob_of_Foraging", "Prob_of_Domestication")
    P.Arisal[which(origins == FALSE), 2]  <- 0
    
    #####
    #prob_choose <- runif(12, 0.01, 1)
    #sub <- (prob_choose[1] - 0.01)
    #sub <- ifelse(sub < .1, .1, sub)
    #prob_choose[c(4)] <- runif(1, 0.01, sub)
    #prob_choose[c(5)] <- runif(1, 0.1, 1) # High extinction
    #prob_choose[c(6)] <- runif(1, 0, (prob_choose[3] - 0.01))
    #prob_choose[c(9, 10, 12)] <- runif(3, 0.01, prob_choose[11])
    
    ####
    prob_choose <- runif(12, 0, 1)
    top <- min(prob_choose[c(1,3)], na.rm=TRUE)
    prob_choose[c(2)] <- runif(1, 0, top)
    
    prob_choose[c(5)] <- runif(1, 0, prob_choose[c(2)]) 
    prob_choose[c(6)] <- runif(1, 0, prob_choose[c(5)])
    prob_choose[c(4)] <- runif(1, prob_choose[c(6)], prob_choose[c(5)])
    
        
    if (count == 1) {
      prob_choose[7:12] <- 0
    }
    if (count == 2) {
      prob_choose[9:12] <- 0
    }
    if (count == 3 | count == 5) {
      prob_choose[7:8] <- 0
      independent <- 0
    }
    if (count == 4 | count == 6) {
      independent <- 0
    }
    
    
    P.speciation <- parameters(prob_choose[1], prob_choose[1],
                               prob_choose[2], prob_choose[3],
                               "Env_NonD", "Env_D", "For", "Dom")

    P.extinction  <- parameters(prob_choose[4], prob_choose[4],
                                prob_choose[5], prob_choose[6],
                                "Env_NonD", "Env_D", "For", "Dom")

    
    P.diffusion <- parameters(0, prob_choose[7],
                              prob_choose[8], 0,
                              "Target_For", "Target_Dom",
                              "Source_For", "Source_Dom")
    
    P.TakeOver <- parameters(prob_choose[9], prob_choose[10],
                             prob_choose[11], prob_choose[12],
                             "Target_For", "Target_Dom",
                             "Source_For", "Source_Dom")
    multiplier <- 1 # always 1 now.
   if (count %in% 1:4) { 
        myOut <- RunSimUltimate(myWorld, P.extinction, P.speciation, 
                            P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                            N.steps = number_of_time_steps, silent = FALSE, 
                            multiplier = multiplier, start = start)
                            }
   if (count %in% 5:6) {
        myOut <- RunSimUltimate.push(myWorld, P.extinction, P.speciation, 
                            P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                            N.steps = number_of_time_steps, silent = FALSE, 
                            multiplier = multiplier, start = start)
                            }
                            
                            
    # Count refers to the combo, 1 = null, 2 = diffusion, 3 = Takeover, 4 = full
    save(myOut,  file= paste0("./Module_1_outputs/myOut_rep_",
                              formatC(replicate_cycle, width = 2,flag = 0),
                              "_combo_",
                              formatC(count, width = 2,flag = 0),
                              "_","params", "_P.speciation_",
                              paste(formatC(P.speciation, width = 2,flag = 0), collapse="_"),"_P.extinct_",
                              paste(formatC(P.extinction, width = 2,flag = 0), collapse="_"), "_P.diffus_",
                              paste(formatC(P.diffusion, width = 2,flag = 0), collapse="_"), "_P.TO_",
                              paste(formatC(P.TakeOver, width = 2,flag = 0), collapse="_"),"_P.Arisal_",
                              paste(formatC(P.Arisal0, width = 2,flag = 0), collapse="_"),
                              "_timesteps_", number_of_time_steps, "_NBS_", number_of_neighbors, "_.Rdata"))

    Sim_statistics <- Module_2(myOut)

    save(Sim_statistics, file= paste0("./Module_2_outputs/Sim_stats_rep_",
                                      formatC(replicate_cycle, width = 2,flag = 0),
                                      "_combo_",
                                      formatC(count, width = 2,flag = 0),
                                      "_","params", "_P.speciation_",
                                      paste(formatC(P.speciation, width = 2,flag = 0), collapse="_"),"_P.extinct_",
                                      paste(formatC(P.extinction, width = 2,flag = 0), collapse="_"), "_P.diffus_",
                                      paste(formatC(P.diffusion, width = 2,flag = 0), collapse="_"), "_P.TO_",
                                      paste(formatC(P.TakeOver, width = 2,flag = 0), collapse="_"),"_P.Arisal_",
                                      paste(formatC(P.Arisal0, width = 2,flag = 0), collapse="_"),
                                      "_timesteps_", number_of_time_steps, "_NBS_", number_of_neighbors, "_.Rdata"))
  }
  
}

#####################################################################
coords <- as.matrix(apply(language_centroids[, 3:4], 2, as.numeric)) #coords
conds <- ifelse(suitability2 == 0, 1, 2)
conds[is.na(conds)] <- sample(c(1, 2), sum(is.na(conds)), replace = TRUE) 
origins <- language_centroids[, 5]

##### Specify simulation parameters #################################

number_of_tips <- length(coords[,1])
number_of_time_steps_a <- 30000
#replicate_cycle <- c(1)  #number of replicates
#####################################################################
data("parameters.table")


system.time(
  myWorld <- BuildWorld(coords, conds)
)
number_of_neighbors <- sample(5:9,1)

nbs <- knn2nb(knearneigh(coords, k = number_of_neighbors, longlat = TRUE),
              sym = TRUE) # 7 symmetric neighbors
n.obs <- sapply(nbs, length)
seq.max <- seq_len(max(n.obs))
nbs <- t(sapply(nbs, "[", i = seq.max))

dim(myWorld)


# NAI <- 1000
args <- commandArgs(trailingOnly = FALSE)
print(args)
NAI <- as.numeric(args[7])
#setwd("~/Box Sync/colliding ranges/Simulations_humans/big world cluster outputs")


# Starting point
start <- sample((1:nrow(language_centroids))[as.logical(language_centroids[, 6])], 1)

#sim_run_cluster(replicate_cycle = NAI,
#                myWorld, number_of_time_steps = number_of_time_steps_a, 
#                nbs, number_of_tips = nrow(myWorld), number_of_neighbors= number_of_neighbors, #origins=origins,start = start)

Here is the second version



#####################################################################

# Run the full model in a cluster. This version writes files to a cluster output folder.
# rm(list = ls())
# install.packages("~/Desktop/FARM_1.0.tar.gz", repos=NULL, type="source")


#####################################################################
## need to document which functions we use from each of these libraries. 
library(ape)
library(spdep)
library(Rcpp)
library(msm)
library(FARM)


sim_run_cluster <- function(replicate_cycle, myWorld, number_of_time_steps, nbs,
                            number_of_tips, number_of_neighbors, origins, start = NULL) {
  # Calls the full simulation script 
  #  
  # Purpose: Need to wrap the entire simulation script into a function so it can be called in parallel from a cluster call  
  #
  # Args:
  #    replicate_cycle: An integer indicating the replicate number of a simulation. This variable is used in this function to label        
  #         the saved output file and control the number of replicates run by the cluster.
  #
  #    combo_number: An interger between 1 and 31 indicating the combinations of S, E, A, D, and T modules to be included 
  #         in the simulation. The full list of these combinations can be printed using the function combo_of_choice(28, TRUE).
  #         We are currently using combinations 25,28,29,and 31 as our four competing models for the spread of agriculture.  
  #
  #    myWorld: Matrix that defines the scope of the available world and acts as a data hub for organizing and reporting      
  #         results from the different elements of the simulation. 
  #
  #    number_of_time_steps: An integer indicating how many iterations the simulation will calculated before writing the data 
  #         file. 
  #
  #    nbs: A list of the available neighbors for each spatial point. This is passed to the function for calculating the interaction 
  #         of neighbors through time. 
  #
  #    number_of_tips: An interger indicating the number of tree tips the simulation should be truncated to. The default is to 
  #         include all the available tips (e.g. 1254 for human languages). 
  #
  # Returns: 
  #    myOut: A list object containing a 'phylo' tree object called mytree in the first position and the myWorld matrix of 
  #         spatial and tree data in the second position 
  #     
  

  x1 <- 4 #Number of runs per core
  sampleer <- sample(c(1,2,5,6), x1)
  #if (replicate_cycle != 1) {
  #  replicate_cycle <- ((replicate_cycle - 1) * x1) + 1
 # }
 # replicate_cycle <- replicate_cycle:(replicate_cycle + (x1 - 1))
  for (count in sampleer) {
  independent <- 1 

    
    # Probability of Arisal
    prob_choose_a <- rev(sort(rexp(4, rate = 9)))
    prob_choose_a <- prob_choose_a[c(sample(1:2, 2), sample(3:4, 2))]
    prob_choose_a[3] <- 0
    P.Arisal0  <- parameters(prob_choose_a[1], prob_choose_a[4],
                             prob_choose_a[3], prob_choose_a[2],
                             "Env_NonD", "Env_D",
                             "Evol_to_F", "Evol_to_D")
    # P.Arisal0 is the one you should change the parameters
    P.Arisal <- matrix(NA, ncol = 2, nrow = nrow(myWorld)) # probability per cell
    colnames(P.Arisal) <- c("Evolve_to_F", "Evolve_to_D")
    Env.Dom <- myWorld[, 7] == 2
    P.Arisal[Env.Dom, 1] <- P.Arisal0[1, 2]
    P.Arisal[!Env.Dom, 1] <- P.Arisal0[1, 1]
    P.Arisal[Env.Dom, 2] <- P.Arisal0[2, 2]
    P.Arisal[!Env.Dom, 2] <- P.Arisal0[2, 1]
    
    colnames(P.Arisal) <- c("Prob_of_Foraging", "Prob_of_Domestication")
    P.Arisal[which(origins == FALSE), 2]  <- 0
    
    #####
    #prob_choose <- runif(12, 0.01, 1)
    #sub <- (prob_choose[1] - 0.01)
    #sub <- ifelse(sub < .1, .1, sub)
    #prob_choose[c(4)] <- runif(1, 0.01, sub)
    #prob_choose[c(5)] <- runif(1, 0.1, 1) # High extinction
    #prob_choose[c(6)] <- runif(1, 0, (prob_choose[3] - 0.01))
    #prob_choose[c(9, 10, 12)] <- runif(3, 0.01, prob_choose[11])
    
    ####
    prob_choose <- runif(12, 0, 1)
    top <- min(prob_choose[c(1,3)], na.rm=TRUE)
    prob_choose[c(2)] <- runif(1, 0, top)
    
    prob_choose[c(5)] <- runif(1, 0, prob_choose[c(2)]) 
    prob_choose[c(6)] <- runif(1, 0, prob_choose[c(5)])
    prob_choose[c(4)] <- runif(1, prob_choose[c(6)], prob_choose[c(5)])
    
        
    if (count == 1) {
      prob_choose[7:12] <- 0
    }
    if (count == 2) {
      prob_choose[9:12] <- 0
    }
    if (count == 3 | count == 5) {
      prob_choose[7:8] <- 0
      independent <- 0
    }
    if (count == 4 | count == 6) {
      independent <- 0
    }
    
    
    P.speciation <- parameters(prob_choose[1], prob_choose[1],
                               prob_choose[2], prob_choose[3],
                               "Env_NonD", "Env_D", "For", "Dom")

    P.extinction  <- parameters(prob_choose[4], prob_choose[4],
                                prob_choose[5], prob_choose[6],
                                "Env_NonD", "Env_D", "For", "Dom")

    
    P.diffusion <- parameters(0, prob_choose[7],
                              prob_choose[8], 0,
                              "Target_For", "Target_Dom",
                              "Source_For", "Source_Dom")
    
    P.TakeOver <- parameters(prob_choose[9], prob_choose[10],
                             prob_choose[11], prob_choose[12],
                             "Target_For", "Target_Dom",
                             "Source_For", "Source_Dom")
    multiplier <- 1 # always 1 now.
   if (count %in% 1:4) { 
        myOut <- RunSimUltimate2(myWorld, P.extinction, P.speciation, P.diffusion, P.Arisal, 
    P.TakeOver, nbs, independent, number_of_time_steps, multiplier, silent = TRUE, 
    count, resolution = seq(1, number_of_time_steps, 100), P.Arisal0, start = NULL)
                            }
   if (count %in% 5:6) {
        myOut <- RunSimUltimate2.push(myWorld, P.extinction, P.speciation, P.diffusion, P.Arisal, 
    P.TakeOver, nbs, independent, number_of_time_steps, multiplier, silent = TRUE, 
    count, resolution = seq(1, number_of_time_steps, 100), P.Arisal0, start = NULL)
                            }
                            
                           
  }
  
}

#####################################################################
coords <- as.matrix(apply(language_centroids[, 3:4], 2, as.numeric)) #coords
conds <- ifelse(suitability2 == 0, 1, 2)
conds[is.na(conds)] <- sample(c(1, 2), sum(is.na(conds)), replace = TRUE) 
origins <- language_centroids[, 5]

##### Specify simulation parameters #################################

number_of_tips <- length(coords[,1])
number_of_time_steps_a <- 50000
#replicate_cycle <- c(1)  #number of replicates
#####################################################################
data("parameters.table")


system.time(
  myWorld <- BuildWorld(coords, conds)
)
number_of_neighbors <- sample(5:9,1)

nbs <- knn2nb(knearneigh(coords, k = number_of_neighbors, longlat = TRUE),
              sym = TRUE) # 7 symmetric neighbors
n.obs <- sapply(nbs, length)
seq.max <- seq_len(max(n.obs))
nbs <- t(sapply(nbs, "[", i = seq.max))

dim(myWorld)


# NAI <- 1000
args <- commandArgs(trailingOnly = FALSE)
print(args)
NAI <- as.numeric(args[7])
#setwd("~/Box Sync/colliding ranges/Simulations_humans/big world cluster outputs")


# Starting point
start <- sample((1:nrow(language_centroids))[as.logical(language_centroids[, 6])], 1)

#sim_run_cluster(replicate_cycle = NAI,
#                myWorld, number_of_time_steps = number_of_time_steps_a, 
#                nbs, number_of_tips = nrow(myWorld), number_of_neighbors= number_of_neighbors, origins=origins,start = start)
install.packages("DiagrammeR")
library(DiagrammeR)

grViz("
  digraph {
    
    node [shape = box
          fontname = Helvetica
          penwidth = 2.0]
    'Input the data hub'; 
    'Is there more than one societ in the world?'; 
    'Pick the only available society';
    'Randomly choos a spreader society';
    'Are there any empty neighboring locations?';
    'Attempt to Takover';
    'Is the spreader a Forager of Domesticator?';
    'is there neighbors with suitable enivonrment for domestication?';
    'Randomly choose a target neighbor';
    'Randomly choose a taret neighbor among the ones with suitable condition for domestication';
    'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?';
    'Spreader removers target from the space and phylogeny';
    'Nothing happens';
    'Speader occupies the target cell and bifurcate in the phylogeny';
    'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?';
    'Attempt to dispersal';
    'Randomly choos an empty target location';
    'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?';
    'output to data hub';

    edge []
    'Input the data hub' -> 'Is there more than one societ in the world?';
    'Is there more than one societ in the world?' -> 'Pick the only available society' [label = 'NO'];
    'Is there more than one societ in the world?' -> 'Randomly choos a spreader society' [label = 'YES'];
    'Pick the only available society' -> 'Are there any empty neighboring locations?';
    'Randomly choos a spreader society' -> 'Are there any empty neighboring locations?';
    'Are there any empty neighboring locations?' -> 'Attempt to dispersal' [label = 'YES'];
    'Are there any empty neighboring locations?' -> 'Attempt to Takover' [label = 'NO'];
    'Attempt to Takover' -> 'Is the spreader a Forager of Domesticator?';
    'Is the spreader a Forager of Domesticator?' -> 'is there neighbors with suitable enivonrment for domestication?' [label = 'Domesticator'];
    'Is the spreader a Forager of Domesticator?' -> 'Randomly choose a target neighbor' [label = 'Forager'];
    'is there neighbors with suitable enivonrment for domestication?' -> 'Randomly choose a target neighbor' [label = 'NO'];
    'is there neighbors with suitable enivonrment for domestication?' -> 'Randomly choose a taret neighbor among the ones with suitable condition for domestication' [label = 'YES'];
    'Randomly choose a taret neighbor among the ones with suitable condition for domestication' -> 'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?';
    'Randomly choose a target neighbor'  -> 'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?' ;
    'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?' -> 'Spreader removers target from the space and phylogeny' [label = 'YES'];
    'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?' -> 'Nothing happens' [label = 'NO'];
    'Nothing happens' -> 'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?';
    'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?' ->  'output to data hub' [label = 'NO'];
    'Attempt to dispersal' -> 'Randomly choos an empty target location';
    'Randomly choos an empty target location' -> 'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?';
    'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?' -> 'Speader occupies the target cell and bifurcate in the phylogeny' [label = 'YES'];
    'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?' -> 'Nothing happens' [label = 'NO'];
  'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?' ->  'Is there more than one societ in the world?' [label = 'YES'];
  'Speader occupies the target cell and bifurcate in the phylogeny' -> 'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?';
  'Spreader removers target from the space and phylogeny' -> 'Speader occupies the target cell and bifurcate in the phylogeny';
  }")

Module 1: Simulation to produce a world and a tree

# Install the most recent version of FARM from a .zip file
install.packages(file.choose(), repos=NULL) 
library(FARM)
ls("package:FARM")

Inputs

Module 1 functions

The first set of RunSim functions are the default pipeline where only one output is saved at the end of the simulation.

This first function controls error messages coming from the primary function below.

# Run the simulation function skiping the erros and atributing NA if it occurs
RunSimUltimate <- function(myWorld, P.extinction, P.speciation,
                           P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                           N.steps, multiplier,
                           silent = TRUE, start = NULL) {

  result <- try(RunSim(myWorld, P.extinction, P.speciation,
                       P.diffusion, P.Arisal, P.TakeOver, nbs,
                       independent, N.steps,
                       multiplier, start = start), silent = silent)
  if (class(result) == "try-error") {
    result <- NA
  }
  return(result)
}

This is the primary function running the simulation.

#==================================================================
# SimulationFunctions.R
#
# Contains a function for simulation of cultural evolution in space and time
# Allows for (1) Vertical Transmission (phylogenetic inheritance); (2) Horizontal
# Transmission (cultural diffusion); (3) Ecological selection (Both speciation and
# extinction are determined by the match between the state of a binary trait and the
# environment a societuy occupies).
#
# 7 Jun 2016
# Carlos A. Botero, Bruno Vilela & Ty Tuff
# Washington University in Saint Louis
#==================================================================
RunSim <- function(myWorld, P.extinction, P.speciation,
                   P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                   N.steps, multiplier, start) {
  # myWorld = The hexagonal world created with the function BuildWorld
  # P.extinction = Probability matrix of extinction
  # P.speciation = Probability matrix of speciation
  # P.diffusion = Probability matrix of diffusion
  # P.Arisal = Probability matrix of arisal
  # P.TakeOver = Probability matrix of takeover
  # N.steps = Number of steps in the model
  # multiplier = The number that will multiply the probabilities according
  # to environmetal fitness.
  # start = the point ID in 'myWorld' that will give risen to humans.
  # (humans origin will be in one of the existing positions)

  world.size <- nrow(myWorld)
  # Initialize parameters we will use later to build the phylogeny
  rootnode <-  world.size + 1 # standard convention for root node number

  # set the seed for simulation
  if (is.null(start)) {
  start <- sample(1:world.size, 1)
  }

  myWorld[start, 4:6] <- c(0, 0, 1) # Setting root(0), time(0), ancestral(1, forager)

  mytree <- TheOriginOfSpecies(world.size, start) # Empty tree
  myT <- 0 # Time starts at zero

  # Common input and output for all the internal modules
  input <- list(P.speciation, P.Arisal, P.diffusion, P.extinction, P.TakeOver,
                myWorld, mytree, myT, multiplier, nbs, independent)

  # Functions order to be randomized
  rand_order_func_run <- list("Extinction", "Diffusion", "SpeciationTakeOver", "Arisal")

  cat("0% [") # Time count

  for (steps in 1:N.steps) { # Starts the loop with 'n' steps

    if (steps %% round((N.steps / 10)) == 0) { # Time count
      cat('-') # Time count
    }# Time count
    if (steps == N.steps) { # Time count
      cat("] 100 %\n")# Time count
    }# Time count

    # Randomize functions order
    rand_order <- sample(rand_order_func_run)
    # Run the functions
    input <- do.call(rand_order[[1]], list(input = input))
    input <- do.call(rand_order[[2]], list(input = input))
    input <- do.call(rand_order[[3]], list(input = input))
    input <- do.call(rand_order[[4]], list(input = input))

  }
  # Trunsform the input/output into the final result and return it
  myWorld <- as.data.frame(input[[6]])
  myWorld[, 8] <- paste0("t", myWorld[, 8])
  mytree <- makePhy(input[[7]])
  mytree$edge.length <- mytree$edge.length / N.steps
  return(list('mytree' = mytree, 'myWorld' = myWorld))
}

Push versions

# Run the simulation function skiping the erros and atributing NA if it occurs
RunSimUltimate.push <- function(myWorld, P.extinction, P.speciation,
                                P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                                N.steps, multiplier,
                                silent = TRUE, start = NULL) {

  result <- try(RunSim.push(myWorld, P.extinction, P.speciation,
                            P.diffusion, P.Arisal, P.TakeOver, nbs,
                            independent, N.steps,
                            multiplier, start = start), silent = silent)
  if (class(result) == "try-error") {
    result <- NA
  }
  return(result)
}
#==================================================================
# SimulationFunctions.R
#
# Contains a function for simulation of cultural evolution in space and time
# Allows for (1) Vertical Transmission (phylogenetic inheritance); (2) Horizontal
# Transmission (cultural diffusion); (3) Ecological selection (Both speciation and
# extinction are determined by the match between the state of a binary trait and the
# environment a societuy occupies).
#
# 7 Jun 2016
# Carlos A. Botero, Bruno Vilela & Ty Tuff
# Washington University in Saint Louis
#==================================================================
RunSim.push <- function(myWorld, P.extinction, P.speciation,
                   P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                   N.steps, multiplier, start) {
  # myWorld = The hexagonal world created with the function BuildWorld
  # P.extinction = Probability matrix of extinction
  # P.speciation = Probability matrix of speciation
  # P.diffusion = Probability matrix of diffusion
  # P.Arisal = Probability matrix of arisal
  # P.TakeOver = Probability matrix of takeover
  # N.steps = Number of steps in the model
  # multiplier = The number that will multiply the probabilities according
  # to environmetal fitness.
  # start = the point ID in 'myWorld' that will give risen to humans.
  # (humans origin will be in one of the existing positions)

  world.size <- nrow(myWorld)
  # Initialize parameters we will use later to build the phylogeny
  rootnode <-  world.size + 1 # standard convention for root node number

  # set the seed for simulation
  if (is.null(start)) {
    start <- sample(1:world.size, 1)
  }

  myWorld[start, 4:6] <- c(0, 0, 1) # Setting root(0), time(0), ancestral(1, forager)

  mytree <- TheOriginOfSpecies(world.size, start) # Empty tree
  myT <- 0 # Time starts at zero

  # Common input and output for all the internal modules
  input <- list(P.speciation, P.Arisal, P.diffusion, P.extinction, P.TakeOver,
                myWorld, mytree, myT, multiplier, nbs, independent)

  # Functions order to be randomized
  rand_order_func_run <- list("Extinction", "Diffusion",
                              "SpeciationTakeOver.push", "Arisal")

  cat("0% [") # Time count

  for (steps in 1:N.steps) { # Starts the loop with 'n' steps

    if (steps %% round((N.steps / 10)) == 0) { # Time count
      cat('-') # Time count
    }# Time count
    if (steps == N.steps) { # Time count
      cat("] 100 %\n")# Time count
    }# Time count

    # Randomize functions order
    rand_order <- sample(rand_order_func_run)
    # Run the functions
    input <- do.call(rand_order[[1]], list(input = input))
    input <- do.call(rand_order[[2]], list(input = input))
    input <- do.call(rand_order[[3]], list(input = input))
    input <- do.call(rand_order[[4]], list(input = input))

  }
  # Trunsform the input/output into the final result and return it
  myWorld <- as.data.frame(input[[6]])
  myWorld[, 8] <- paste0("t", myWorld[, 8])
  mytree <- makePhy(input[[7]])
  mytree$edge.length <- mytree$edge.length / N.steps
  return(list('mytree' = mytree, 'myWorld' = myWorld))
}

Module 2: Space and phylogeny summary statistics

This is the master document for Module 2, a foundational function in our FARM package that analyzes results from Module 1, the other foundational function. Module 1 simulates a spatial pattern and a phylogenetic tree given a set of environmental and inheritance rules and then Module 2 summarizes those simulated results using a large set of targeted summary statistics. Here we describe our choice of summary statistics, justify those choices as part of a larger theoretical context, and provide our reproducable code for executing the anaylses yourself. These two parts are seperated into modules so that they can act independently. An combination of spatial pattern and associated phylogeny many be used as long as they are formatted correctly.

This pipeline was designed to analyze a simulated world where all the information is known about both the world and the tree. There is no missing information, just extinct trees. This is much different than our real tree that has loads of uncertainty unevenly destributed across it. The result you see demonstrated right now are one simulated result of many. I need to do a sister page to this were we do this entire analysis on the real tree, or best real tree we’ve got.

We have four types of data available for asking research questions using D-place data: phylogenies, spatial locations, trait identities, and environmental reconstructions. Any one of these four data types alone are relatively information poor, so we are searching for ways to model connections between these data types to draw stronger conclusions overall.

Other modules can use the summary statistics generated from this module to test hypotheses. We currently have a ABC and Random Forest module started but there will be more to come.

These are quantitative connections that we are assumed in the analyses, but we don’t actually have any support in the data for doing so. 1. nearest neighbor connectivity measures 2. Abundance estimates 3. Pairwise influence (history) between cultures. 4. Environmental reconstruction validation evidence

Phylogenetic summary statistics

Whole tree vs. part of tree? These statistics are generally used to compare one sample to another. For example, an experimental contrast between two sites, two phylogenetic groups, or two communities in two different locations. Here we are calculating these statistics for the global langauage tree to compare against global trees created in our simulation. You still retain the ability to subset this tree or others and send only those subsets through this code to compare the values with each other afterwards.

  1. Branch Length (richness and divergence)
  2. Pairwise distance between tips (richness, divergence, and regularity)
  3. Phylogenetic isolation (divergence, and regularity)
  4. Nearest Neighbor (divergence, and regularity)

All trees are ultrametric.

Introduction and framework

The choice of phylogenetic analyses and organizational scheme is based on the suggestions of Tucker et al. (2016). Here are a few images from that paper for an overview:

From Tucker et al. (2016) From Tucker et al. (2016)

From Tucker et al. (2016)

From Tucker et al. (2016)

From Tucker et al. (2016)

From Tucker et al. (2016)

library(knitr)
library(phytools)
library(FARM)
library(ROCR)
library(spdep)
load('~/Downloads/download.Rdata')

this_tree <- myOut$mytree
this_world <- myOut$myWorld
str(this_world)
str(this_tree)

Alpha diversity metrics

Branch Lengths

Branch length data is embedded in the tree object provided to this function. The first step in summarizing the lengths is to extract those data from the tree object. These data are called ‘edges’ in the tree object. We extract branch lengths and create an object called ‘Branch_lengths’ for passing on to the other summary functions. The histogram below shows the frequency of different branch lengths found throughout the tree.

Branch_Lengths <- this_tree$edge.length

We can summarize branch lengths according to normal summary statistics, but it can be difficult to assign evolutionary meaning to some of these metrics and so they are not regularly used as best I can tell. This lack of meaning does not mean that these statistics couldn’t be used to distinguish between large simulated trees.

mean_branch_length <- mean(Branch_Lengths)
variance_branch_length <- var(Branch_Lengths)
SD_branch_length <- sd(Branch_Lengths)

Phylogenetic diversity (\(PD\))

Phylogenetic diversity (\(PD\)) is the summation (\(\sum\)) of all branch lengths connecting species together, where \(B_{t}\) is the set of included tips and \(L_{b}\) is Branch lengths (Faith 1992). This is an anchor test, which means it is regularly used, well understood, and we should use it to anchor our work to past work. PD is a richness measure, it tells us how much evolutionary history is associated with a set of tips.

\[PD = \sum_{b \in B_{t}}^{}L_{b}\]

# Anchor test = PD (Faith's phylogenetic diversity)
Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)
Pylo_diversity_is_sum_of_BL

There are variations on this measure that we have NOT implemented here. It is popular to scale this measure according to some ecological driver. Barker (2002) scales branch lengths (\(L_{b}\)) by multiplying them against the abundance of individuals at at tip (\(A_{b}\)). Others (D. F. Rosauer et al. 2009), scale them by their range size instead (\(R_{b}\)).

\[\Delta n PD = \sum_{b \in B_{t}}^{}A_{b}L_{b}\] \[PE = \sum_{b \in B_{t}}^{}\dfrac{L_{b}}{R_{b}}\] Argueing that proportional abundance phylogenetic diversity (\(PD_{Ab}\)) is more effective than the standard PD calculated from raw abundance, Vellend et al. (2011) penned a new version of PD where \(B\) is the total number of branch lengths (\(L_{b}\)). Note: We don’t have abundance data right now for the human project so this metric is not currently very helpful.
\[PD_{Ab} = B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}\]

#Calculate B
number_of_branches <- length(Branch_Lengths)
number_of_branches

Average phylogenetic diversity (\(avPD\))

Average phylogenetic diversity (\(avPD\)) (Clarke and Warwick 2001) is a branch length-based divergence indices where PD is divided by the total number of tips (\(S\)) in the tree. \[avPD = \dfrac{PD}{S}\]

Number_of_tips <- length(this_tree$tip.label)
average_phylogenetic_diversity <- Pylo_diversity_is_sum_of_BL / Number_of_tips
average_phylogenetic_diversity

There is also a proportional abundance version of average phylogenetic diversity (\(avPD_{Ab}\)) (Tucker et al. 2016). Again, we don’t have abundance values yet for D-place. \[avPD_{Ab} = \dfrac{B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}}{S}\]

Pairwise distance between tips

This is the patristic distance, the sum of the branch lengths following the shortest distance between two tips in a tree, implemented as a distance matrix where every tip is compared to every other tip. This distance function can be anything. We use euclidean and environmental distance matrices heavily in the spatial analyses.

Calculate the patristic distance between two taxa, for all taxa

Calculate the patristic distance between two taxa using the R package ‘phytools’, this function takes a ‘phylo’ tree object and returns a distance matrix between tips. Need original citation.

## Pairwise distance between tips - From library(ape) in library(phytools)
Pairwise_dist <- cophenetic(this_tree)

yields a distance matrix (list of 2D matrices) of all distances between taxa.

Sum of all pairwise distances (\(F\))

Now we can use a set of summary statistics to describe those pairwise distances. The sum of all pairwise distances, \(F\), is formally called ‘Extensive quadratic entropy’. (???). Just as it was with branch lengths, this is a richness measure and, accordingly, should be used to answer richness questions.

\[F = \sum_{i} \sum_{j} d_{ij}\]

# F -- Extensive quadratic entropy
F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)
F_quadratic_entropy_is_sum_of_PD

Mean pairwise distance (MPD)

Mean inter-species distances. The mean of all pairwise distances, \(MPD\) (a.k.a. \(AvTD\), and \(\Delta^{+}\)), is the mean distance between species. (Clarke and Warwick 1998; Webb et al. 2002; Webb, Ackerly, and Kembel 2008; Kembel et al. 2010). \[MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}\]

# Anchor test = MPD (mean pairwise distance)
Mean_pairwise_distance <- 
  Pairwise_dist / (Number_of_tips * (Number_of_tips - 1) ) 

MPD anchored to the root

There is an extention to mean pairwise distance calculations from Helmus et al. (2010) called \(PSV\), \(PSR\), and \(PSE\), phylogenetic species variability, phylogenetic species richness, and phylogenetic species evenness. These measures take the basic pairwise distance calculations and anchor them to the root of the tree so distances have a common denominator. This extention is implemented by using the same equations, just with a constrained set of \(d_{ij}\) conditions. Specifically,

\[PSV = MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}\] \[PSR = \sum_{i} {(\dfrac{1}{S-1} \sum_{j} {d_{ij}})}\] \[PSE = \dfrac{S}{S-1} \sum_{ij} d_{ij}p_{i}p_{j}\]

with these specific values of \(d_{ij}\)

\[ d_{ji}=0.5*(c_{ii} + c_{jj} - c_{ij}) \\ \ \\ or \\ \ \\ d_{ij} = 1 - c_{ij} / (\sqrt{c_{ii}c_{jj}}) \] and \[ c_{ii} = the \ sum \ of \ branch \ lengths \ from \ tip \ i \ to \ the \ root \ of \ the \ phylogenetic \ tree. \\ \ \\ c_{ij} = the \ sum \ of \ branch \ lengths \ from \ first \ common \ ancestor \ for \ i \ and \ j \ to \ the \ root. \]

Average distance between two randomly chosen species

\(J\), Intensive quadratic entropy, which is the average distance between two randomly chosen species (???) \[J = \dfrac{\sum_{ij}d_{ij}}{S^2} \]

Simpson’s diversity index for pairwise distance

There has been a long effort to pen a phylogenetic analogy to a Simpson’s diversity index. (Rao 1982; Clarke and Warwick 1998; S. Ã. Pavoine, Ollier, and Pontier 2005; Hardy and Senterre 2007; Webb et al. 2002; Webb, Ackerly, and Kembel 2008; Kembel et al. 2010). The conclusion seems to be that this measure is equivilent to scaling \(MPD\) by abundance \(p_{i}\) and \(p_{j}\) to get \(MPD_{Ab}\). This is also a special case of Rao’s Quadratic Entropy, \(Roe's QE\). Note: not using abundance measures yet for D-place data.

\[MPD_{Ab} = \sum_{i} \sum_{j} d_{ij} p_{i} p_{j}\]

Interspecific comparisons of pairwise distances

The interspecific variant (rather than the intraspecific default described above) defines the expected phylogenetic distance between two indivdiuals randomely drawn conditionally on the fact that they indivdulas from different species. \[InterMPD_{Ab} = \dfrac{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j} }{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j}} \]

Variance in pairwise distances (\(VPD\))

Variance in pairwise distances, \(VPD\) (a.k.a. \(VarTD\) and \(\Lambda^+\)), is a regularity indices. Clarke and Warwick (2001) Variance is relative to tips, \(S\), not to total branches (\(B\) from above). These are the residuals, they compare each individual pairwise connection to the overall mean.

\[VPD = \dfrac{1}{S(S-1)} (\sum_{i} \sum_{j \ne i} {(d_{ij} - MPD)^2})\]

#need to adjust to equation above!

#Pairwise distance/all distances -- Variance of pairwise distances

# Anchor test = VPD (variation of pairwise distance)  

variance_pairwise_distance <- var(as.vector(Pairwise_dist))

Variants of \(VPD\) are \(VPD_{ab}\) and \(InterPVD_{Ab}\), where variance is scaled by abundance or compared in and out of species. These are also regularity indices.

\[ VPD_{Ab}= (\sum_{i} \sum_{j} n_{i} n_{j}) * \dfrac{\sum_{i} \sum_{j} n_{i} n_{j} (d_{ij} - MPD_{Ab})^2} {(\sum_{i} \sum_{j} n_{i} n_{j})^2 - \sum_{i} \sum_{j} (n_{i} n_{j})^2} \\ or \\ InterVPD_{Ab} = (\sum_{i} \sum_{j \ne i} n_{i} n_{j}) * \dfrac{\sum_{i} \sum_{j \ne i} n_{i} n_{j} (d_{ij} - InterMPD_{Ab})^2} {(\sum_{i} \sum_{j \ne i} n_{i} n_{j})^2 - \sum_{i} \sum_{j \ne i} (n_{i} n_{j})^2} \]

Nearest phylogenetic neighbor

Divergence indices

Divergence indices using nearest distance: \(MNTD\) and \(MNTD_{Ab}\), Mean nearest taxon distance and Abundance-weighted MNTD (Webb et al. 2002; Webb, Ackerly, and Kembel 2008; Kembel et al. 2010).

\(MNTD\), mean nearest taxon distance, is the mean shortest distance from a species to all other in the assemblage (Webb et al. 2002; Webb, Ackerly, and Kembel 2008; Kembel et al. 2010).
\[ MNTD = \dfrac{1}{S} \sum_{i} d_{i_{min}} \]

\(MNTD_{AB}\), abundance adjusted mean nearest taxon distance. Adjusted by species proportions (i.e. species’ relative abundances) (Webb et al. 2002; Webb, Ackerly, and Kembel 2008; Kembel et al. 2010)
\[ MNTD_{Ab} = \sum_{i=1}^{S} [d_{i_{min}} * p_{i}] \]

Regularity indices

Regularity indices using nearest distances: \(VNTD\), \(VNTD_{Ab}\), \(PE_{ev}\).

\(VNTD\), Variance in nearest taxon distances, is the variance in nearest pairwise distance (Tucker et al. 2016). \[ VNTD = \dfrac{1}{S} \sum_{i-1}^{S} [(d_{i_{min}} - MNTD)^2] \] \(VNTD_{Ab}\), Abundance weighted variance in nearest taxon distances, is scales by abundance in the same way as descried above (Tucker et al. 2016). \[ VNTD_{Ab} = \dfrac {(\sum_{i} n_{i}) \sum_{i} n_{i} (d_{i_{min}} - MNTD_{Ab})^2} {(\sum_{i} n_{i})^2 - \sum_{i} n_{i} ^2} \]

Phylogenetic version of the funtional \(FE_{ve}\) index

\(PE_{ve}\), phylogenetic evenness is a phylogenetic version of the funtional \(FE_{ve}\) index. First a minimum spanning tree (\(MST\)) is computed using the cophenetic distance obtained from the phylogenetic tree. The \(MST\) contains \(S-1\) Branches connection the \(S\) species. We denote \(l\) a branch on the \(MST\), \(dist(i,j)\) is the length the branch \(l\) that connects species \(i\) and \(j\). \(n_{i}\) is, as defined above, the abundance of species \(i\) in the asseblage (Villeger, Mason, and Mouillot 2008; Dehling et al. 2014).

\[ Weighted \ evenness: \\ EW_{i} = \dfrac{dist(i,j)} {(n_{i} + n_{j})/(\sum_{k=1}^{S}n_{k})} \\ \ \\ Partial \ weighted \ evenness: \\ PEW_{l} = \dfrac {EW_{l}} {\sum_{l=1}^{S-1} EW_{l}} \\ \ \\ Phylogenetic \ evenness: \\ PE_{ve} = \dfrac {\sum_{l=1}^{S-1} min(PEW_{l}, \dfrac{1}{S-1}) - (\dfrac{1}{S-1})} {1- (\dfrac{1}{S-1})} \]

Phylogenetic isolation

A phylogenetic isolation index represents the relative isolation of a given species within a phylogenetic tree. Several indices have been proposed so far but we focus here on the evolutionary distinctiveness index called ‘Fair Proportion’ as proposed by (???) and Isaac et al. (2007).

Evolutionary distinctiveness (richness indices)

\(ED\), evolutionary distinctiveness is a richness indices. NOTE: not equal to Faith’s PD because the \(ED_{i}\) are computed from the regional pool of species and sumed across a given assemblage (i.e. a subset of the regional species pool) (Tucker et al. 2016; Safi et al. 2013; ???; Isaac et al. 2007).

\[ ED = \sum_{i}ED_{i} \\ \ \\ where \ ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}} \]

\(AED\), Abundance-weighted \(ED\) (Tucker et al. 2016; Cadotte et al. 2010). \[ \sum_{i} AED_{i} \\ \ \\ where \ AED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{A_{b} }* p_{i} \]


# Bruno's function for ED. Provided in library(FARM)

evol.distinct2 <- function (tree, type = c("equal.splits", "fair.proportion"), 
    scale = FALSE, use.branch.lengths = TRUE) 
{
    type <- match.arg(type)
    if (is.rooted(tree) == FALSE) 
        warning("A rooted phylogeny is required for meaningful output of this function", 
            call. = FALSE)
    if (scale == TRUE) {
        if (is.ultrametric(tree) == TRUE) 
            tree$edge.length <- tree$edge.length/(as.numeric(branching.times(tree)[1]))
        else tree$edge.length <- tree$edge.length/sum(tree$edge.length)
    }
    if (use.branch.lengths == FALSE) 
        tree$edge.length <- rep(1, length(tree$edge.length))
    for (i in 1:length(tree$tip.label)) {
        spp <- tree$tip.label[i]
        nodes <- .get.nodes(tree, spp)
        nodes <- nodes[1:(length(nodes) - 1)]
        internal.brlen <- tree$edge.length[which(tree$edge[, 
            2] %in% nodes)]
        if (length(internal.brlen) != 0) {
            internal.brlen <- internal.brlen * switch(type, equal.splits = sort(rep(0.5, 
                length(internal.brlen))^c(1:length(internal.brlen))), 
                fair.proportion = {
                  for (j in 1:length(nodes)) {
                    sons <- .node.desc(tree, nodes[j])
                    n.descendents <- length(sons$tips)
                    if (j == 1) portion <- n.descendents else portion <- c(n.descendents, 
                      portion)
                  }
                  1/portion
                })
        }
        ED <- sum(internal.brlen, tree$edge.length[which.edge(tree, 
            spp)])
        if (i == 1) 
            w <- ED
        else w <- c(w, ED)
    }
    return(w)
}

Evolutionary distinctiveness is our basic measure of phylogenetic isolation. #This should likely be ‘fair proportions’ instead of ‘equal.splits’.


# Calculate ED
# Using equal.splits method, faster computation
# Evolutionary_distinctiveness_i <- evol.distinct2(this_tree, type = "equal.splits")  

# ED - Summed evolutionary distinctiveness
# Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness_i)
#Evolutionary_distinctiveness_sum

We can run some standard summary statistics (mean and variance) on this ED measure. var(Ed) shows up close to VPD on the PCAs in the intro (Tucker et al. 2016).

# mean(ED)
# mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness_i)

# var(ED)
#variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness_i)
#mean_Phylogenetic_isolation
#variance_Phylogenetic_isolation

Mean evolutionary distinctiveness (divergence indices)

The divergence indices version for \(ED\) is mean evolutionary distinctiveness, \(MED\). The mean of evolutionary distinctiveness (???; Isaac et al. 2007). \[ MED = \dfrac {\sum_{i} ED_{i}} {S} \\ \ \\ with \\ \ \\ ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}} \] #### Entropy measure of evolutonary distinctiveness (regularity indices) The regularity indices for \(ED\)/phylogenetic isolation are \(H_{ED}\), \(E_{ED}\), \(var(ED)\), \(H_{AED}\)

\(H_{ED}\), Entropy measure of evolutionary distinctiveness, is the shannon index applied to evolutionary distinctiveness values (Cadotte et al. 2010). \[ H_{ED} = -\sum_{i=1}^{S} ((\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}}) * \ln (\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}})) \]

\(E_{ED}\), Equitability of evolutionary distinctiveness, is \(H_{ED}\) controlled for species richness (Cadotte et al. 2010).

\[ E_{ED} = \dfrac{H_{ED}}{\ln(S)} \] \(var(ED)\), Variance in evolutinoary distinctiveness, is the variance of species evolutionary distinctiveness (Tucker et al. 2016).

\[ var(ED) = \dfrac{1}{S-1} * \sum_{i=1}^{S} (ED_{i}-\dfrac{\sum_{i=1}^{S} ED_{i}}{S})^2 \] \(H_{ED_{Ab}}\), Abundance-weighted version of \(H_{ED}\) (Cadotte et al. 2010).

\[ H_{ED_{Ab}} = -\sum_{i=1}^{S} (\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}} * \ln(\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}})) \]

Beta diversity

We currently are not using any beta diversity metrics but there are many to choose from if we decide to add them later.

Tree topology

Tree topology is a measure of the shape of the overall tree. The tree can be lopsided side-to-side or front-to-back.

Our most trusted index for the tippy vs trunky of a tree is the gamma index, \(\gamma\).The index characterizes the distribution of branching events within the tree. Trees with γ < 0 have relatively longer branches towards the tips of the phylogeny (tippy trees), whereas trees with γ > 0 have relatively longer inter-nodal distances towards the root of the phylogeny (stemmy trees). tk represents an ‘evolutionary period’ (limits are given by two speciation events) or equivalently an internode distance (Pybus and Harvey 2000).

\[ \gamma = \dfrac {(\dfrac{1}{S-2}* \sum_{i=2}^{S-1} (\sum_{k=2}^{i} Kt_{k}))- \dfrac{1}{2} * \sum_{j=2}^{S} jt_{j}} {(\sum_{j=2}^{S} jt_{j}) * \sqrt{\dfrac{1}{12*(S-2)}}} \]

# ltt function from library(phytools)
ltt <- function (tree, plot = TRUE, drop.extinct = FALSE, log.lineages = TRUE, 
    gamma = TRUE, ...) 
{
    tol <- 1e-06
    if (!inherits(tree, "phylo") && !inherits(tree, "multiPhylo")) 
        stop("tree must be object of class \"phylo\" or \"multiPhylo\".")
    if (inherits(tree, "multiPhylo")) {
        obj <- lapply(tree, ltt, plot = FALSE, drop.extinct = drop.extinct, 
            log.lineages = log.lineages, gamma = gamma)
        class(obj) <- "multiLtt"
    }
    else {
        tree <- reorder.phylo(tree, order = "cladewise")
        if (!is.null(tree$node.label)) {
            node.names <- setNames(tree$node.label, 1:tree$Nnode + 
                Ntip(tree))
            tree$node.label <- NULL
        }
        else node.names <- NULL
        if (is.ultrametric(tree)) {
            h <- max(nodeHeights(tree))
            time <- c(0, h - sort(branching.times(tree), decreasing = TRUE), 
                h)
            nodes <- as.numeric(names(time)[2:(length(time) - 
                1)])
            ltt <- c(cumsum(c(1, sapply(nodes, function(x, y) sum(y == 
                x) - 1, y = tree$edge[, 1]))), length(tree$tip.label))
            names(ltt) <- names(time)
        }
        else {
            drop.extinct.tips <- function(phy) {
                temp <- diag(vcv(phy))
                if (length(temp[temp < (max(temp) - tol)]) > 
                  0) 
                  pruned.phy <- drop.tip(phy, names(temp[temp < 
                    (max(temp) - tol)]))
                else pruned.phy <- phy
                return(pruned.phy)
            }
            if (drop.extinct == TRUE) 
                tree <- drop.extinct.tips(tree)
            root <- length(tree$tip) + 1
            node.height <- matrix(NA, nrow(tree$edge), 2)
            for (i in 1:nrow(tree$edge)) {
                if (tree$edge[i, 1] == root) {
                  node.height[i, 1] <- 0
                  node.height[i, 2] <- tree$edge.length[i]
                }
                else {
                  node.height[i, 1] <- node.height[match(tree$edge[i, 
                    1], tree$edge[, 2]), 2]
                  node.height[i, 2] <- node.height[i, 1] + tree$edge.length[i]
                }
            }
            ltt <- vector()
            tree.length <- max(node.height)
            n.extinct <- sum(node.height[tree$edge[, 2] <= length(tree$tip), 
                2] < (tree.length - tol))
            node.height[tree$edge[, 2] <= length(tree$tip), 2] <- node.height[tree$edge[, 
                2] <= length(tree$tip), 2] + 1.1 * tol
            time <- c(0, node.height[, 2])
            names(time) <- as.character(c(root, tree$edge[, 2]))
            temp <- vector()
            time <- time[order(time)]
            time <- time[1:(tree$Nnode + n.extinct + 1)]
            for (i in 1:(length(time) - 1)) {
                ltt[i] <- 0
                for (j in 1:nrow(node.height)) ltt[i] <- ltt[i] + 
                  (time[i] >= (node.height[j, 1] - tol) && time[i] <= 
                    (node.height[j, 2] - tol))
            }
            ltt[i + 1] <- 0
            for (j in 1:nrow(node.height)) ltt[i + 1] <- ltt[i + 
                1] + (time[i + 1] <= (node.height[j, 2] + tol))
            names(ltt) <- names(time)
            ltt <- c(1, ltt)
            time <- c(0, time)
            time[length(time)] <- time[length(time)] - 1.1 * 
                tol
        }
        if (!is.null(node.names)) {
            nn <- sapply(names(time), function(x, y) if (any(names(y) == 
                x)) 
                y[which(names(y) == x)]
            else "", y = node.names)
            names(ltt) <- names(time) <- nn
        }
        if (gamma == FALSE) {
            obj <- list(ltt = ltt, times = time, tree = tree)
            class(obj) <- "ltt"
        }
        else {
            gam <- gammatest(list(ltt = ltt, times = time))
            obj <- list(ltt = ltt, times = time, gamma = gam$gamma, 
                p = gam$p, tree = tree)
            class(obj) <- "ltt"
        }
    }
    if (plot) 
        plot(obj, log.lineages = log.lineages, ...)
    obj
}
<environment: namespace:phytools>
ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
ltts
str(ltts)
lineages_through_time <- as.numeric(ltts[[1]])
time_steps <- as.numeric(ltts[[2]])
#extract Gamma index
gamma <- ltts[[3]]
gamma_p_value <- ltts[[4]]
lineages_through_time 
time_steps 
gamma 
gamma_p_value 

There are two other regularly used metrics that include abundance measures. Note: we don’t have abundance measures for D-place data.

\(IAC\), imbalance of abundance at the clade level, quantifies the relative deviation in the abundance distribution from a null case where individuals are evenly partitioned between clade splits. \(v\) is the number of nodes in the phylogenetic tree. \(n_{i}\) is, as defined above, the abundance of species \(i\) in the assemblage. \(\eta_{k}\) is the expected abundance species \(i\) would have if the abundance was randomly split among lineages in the phylogenetic tree at each speciation event. is the number of lineages originating at node \(k\) in the set \(s(k,root)\), which contains the nodes located on the path between node \(k\) and the root of the phylogenetic tree. N is the total assemblage abundance (Cadotte et al. 2010).

\[ \dfrac{\sum_{i=1}^{S} |n_{i} - \hat{n_{i}}|} {v} \\ \ \\ where \\ \ \\ \hat{n_{i}} = \dfrac{N}{\prod_{K \in s(i, root)}\eta_{k}} \]

\(I_{c}\), the Colless index, is the sum of the absolute differences in species richness between sister-clades at each internal node. For fully resolved trees, each internal node defines two sister-clades. \(S_{1k}\) is the number of species descending from the first clade defined by node k and \(S_{2k}\) that of the second clade. \(v\) is, as defined above, the number of nodes in the phylogenetic (Colless 1982).

\[ I_{c} = \sum_{k=1}^{v} |S_{1k} - S_{2k}| \]

Macroevolutionary rates


#function name = bd, function input = tree of type 'phylo'
bd <- function (tree) 
{
    tree$edge.length <- tree$edge.length/max(tree$edge.length)
    x <- birthdeath(tree)
    b <- x$para[2]/(1 - x$para[1])
    d <- b - x$para[2]
    c(setNames(c(b, d), c("b", "d")), x$para)
}
<environment: namespace:FARM>
 ## Speciation vs extinction rates and Net diversification
bds <- bd(this_tree)
speciation_rate <- bds[1]
extinction_rate <- bds[2]
extinction_per_speciation <- bds[3]
speciation_minus_extinction <- bds[4]
  
## Speciation vs extinction rates and Net diversification dependent on trait
# N.for.dom <- table(this_world[, 6])
#    if(length(N.for.dom) == 2) {
par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
trait_1_speciation <- par.div.dep[1]
trait_2_speciation <- par.div.dep[2]
trait_1_extinction <- par.div.dep[3]
trait_2_extinction <- par.div.dep[4]
transition_from_trait_1_to_2 <- par.div.dep[5]
transition_from_trait_2_to_1 <- par.div.dep[6]
transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
      

## Crown age per trait AUC and effect size
tip.length <- this_tree$edge.length[this_tree$edge[, 2] %in% 1:Ntip(this_tree)]
tip.length <- (tip.length - min(tip.length)) / (max(tip.length) - min(tip.length))
this_trait <- this_world[match(this_tree$tip.label, this_world[, 8]), 6]
tip.length.2 <- tip.length[this_trait == 2]
tip.length.1 <- tip.length[this_trait == 1]
model <- glm(as.factor(this_trait) ~ log(tip.length + 1),
             family = "binomial")
effect.size <- model$coefficients[2]
# plot(y = this_trait - 1, x= log(tip.length))
p <- predict(model, as.factor(this_trait), type = "resp")
# points(y = p, x = log(tip.length), col = "red")
pr <- prediction(p, as.factor(this_trait))
auc.model <- performance(pr, measure = "auc")@y.values[[1]]
## Phylogenetic signal (D)
Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)

Spatial Locations


## Spatial Analysis
nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
nbs.listw <- nb2listw(nbs)
factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
spatial.tests.fora <- spatial.tests[[1]]$statistic
spatial.tests.dom <- spatial.tests[[2]]$statistic
#prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
results_summary_matrix_1 <- cbind(

        number_of_branches,
        #Pylo_diversity_is_sum_of_BL,
        #average_phylogenetic_diversity_is_mean_of_BL,
        #variance_Pylo_diversity_is_variance_of_BL,

        F_quadratic_entropy_is_sum_of_PD,
        Mean_pairwise_distance,
        variance_pairwise_distance,

        #Evolutionary_distinctiveness_sum,
        #mean_Phylogenetic_isolation,
        #variance_Phylogenetic_isolation,

        gamma,
        gamma_p_value,
        speciation_rate,
        extinction_rate,
        extinction_per_speciation,
        speciation_minus_extinction,
        trait_1_speciation,
        trait_2_speciation ,
        trait_1_extinction ,
        trait_2_extinction ,
        transition_from_trait_1_to_2 ,
        transition_from_trait_2_to_1 ,
        transition_rate_ratio_1to2_over_2to1 ,
        Phylogenetic_signal,
        spatial.tests.fora,
        spatial.tests.dom,
       # prevalence,
       # auc.model,
        effect.size
      )
      #rownames(results_summary_matrix_1) <- 1

      #results_summary_matrix_2 <- cbind(
      #  c(Evolutionary_distinctiveness,NA),
      #  lineages_through_time,
      #  time_steps
      #)
      #colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness", "lineages_through_time", "time_steps")
      #head(results_summary_matrix_2)

      ### Returns from function in list form
      #returns <- list(
        #Branch_Lengths,
        #Pairwise_dist,
      #  results_summary_matrix_1,
      #  results_summary_matrix_2

      #)

      #names(returns) <- c(
        #"Branch_Lengths",
        #"Pairwise_distance",
       # "results_summary_of_single_value_outputs",
       # "results_summary_matrix_of_multi_value_outputs"
      #)
      
      

Module2() returns these two matrices as a list

Here is the exact version in R

## This module analyzes the results from module 1 and returns a list based on how many values each stat returns
## Ty Tuff and Bruno Vilela
## 24 August 2016

###### Specify function ##############################

Module_2 <- function(Module_1_output) {
  cat("\nAnalyzing: 0% [")
  if (any(is.na(Module_1_output))) {
    cat("----------]")
    return(NA)
  } else {

    this_tree <- Module_1_output$mytree
    this_world <- Module_1_output$myWorld

    ##### (0) Pull necessary variables from simulated trees and organize into a single object for all the tests below to pull from.

    #str(all_trees)
    #str(this_tree)

    ## 0a) Branch lengths
    Branch_Lengths <- this_tree$edge.length
    number_of_branches <- length(Branch_Lengths)

    # Anchor test = PD (Faith's phylogenetic diversity)
    Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)

    # avPD -- Average phylogenetic diversity
    average_phylogenetic_diversity_is_mean_of_BL <- mean(Branch_Lengths)

    variance_Pylo_diversity_is_variance_of_BL <- var(Branch_Lengths)
    cat("-")
    ## 0b) Pairwise distance between tips
    Pairwise_dist <- cophenetic.phylo(this_tree)
    cat("-")
    # 2b) Pairwise distance -- Sum of pairwise distances

    # F -- Extensive quadratic entropy
    F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)

    #Mean inter-species distances

    # Anchor test = MPD (mean pairwise distance)

    Mean_pairwise_distance <- mean(Pairwise_dist)

    cat("-")
    #Pairwise distance/all distances -- Variance of pairwise distances

    # Anchor test = VPD (variation of pairwise distance)

    variance_pairwise_distance <- var(as.vector(Pairwise_dist))

    ## 0c) Phylogenetic isolation

    # Using equal.splits method, faster computation
    Evolutionary_distinctiveness <- evol.distinct2(this_tree, type = "fair.proportion")
    cat("-")
    # ED - Summed evolutionary distinctiveness

    Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness)

    ## 3d) Phylogenetic isolation -- Mean of species evolutionary distinctiveness

    # mean(ED)

    mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness)

    ## 4d) Phylogenetic isolation -- Variance of species isolation metrics

    #var(ED)

    variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness)
    cat("-")

    ## Tree topology

    #Gamma index

    ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
    lineages_through_time <- as.numeric(ltts[[1]])
    time_steps <- as.numeric(ltts[[2]])
    gamma <- ltts[[3]]
    gamma_p_value <- ltts[[4]]
    cat("-")

    colless_stat <- colless(as.treeshape(this_tree))
    sackin_index <- sackin(as.treeshape(this_tree))
    tree_shape_stat <- shape.statistic(as.treeshape(this_tree))

    ##### (5) Tree metric -- Macroevolutionary - Rate and rate changes ###############
    ##################################################

    ## Speciation vs extinction rates and Net diversification
    bds <- bd(this_tree)
    speciation_rate <- bds[1]
    extinction_rate <- bds[2]
    extinction_per_speciation <- bds[3]
    speciation_minus_extinction <- bds[4]
    cat("-")

    ## Speciation vs extinction rates and Net diversification dependent on trait
    N.for.dom <- table(this_world[, 6])
    if(length(N.for.dom) == 2) {
      par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
      trait_1_speciation <- par.div.dep[1]
      trait_2_speciation <- par.div.dep[2]
      trait_1_extinction <- par.div.dep[3]
      trait_2_extinction <- par.div.dep[4]
      transition_from_trait_1_to_2 <- par.div.dep[5]
      transition_from_trait_2_to_1 <- par.div.dep[6]
      transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
      cat("-")

      ## Phylogenetic signal (D)
      Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)
      cat("-")

      ## Spatial Analysis
      nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
      nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
      nbs.listw <- nb2listw(nbs)
      factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
      spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
      spatial.tests.fora <- spatial.tests[[1]]$statistic
      spatial.tests.dom <- spatial.tests[[2]]$statistic
      prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
      cat("-")
    } else {
      trait_1_speciation <- NA
      trait_2_speciation <- NA
      trait_1_extinction <- NA
      trait_2_extinction <- NA
      transition_from_trait_1_to_2 <- NA
      transition_from_trait_2_to_1 <- NA
      transition_rate_ratio_1to2_over_2to1 <- NA
      Phylogenetic_signal <- NA
      spatial.tests.fora <- NA
      spatial.tests.dom <- NA
      prevalence <- ifelse(names(table(this_world[, 6])[1]) == "1", 1,
                           -1)
      cat("---")

    }

    results_summary_matrix_1 <- cbind(

      number_of_branches,
      Pylo_diversity_is_sum_of_BL,
      average_phylogenetic_diversity_is_mean_of_BL,
      variance_Pylo_diversity_is_variance_of_BL,

      F_quadratic_entropy_is_sum_of_PD,
      Mean_pairwise_distance,
      variance_pairwise_distance,

      colless_stat ,
      sackin_index ,
      tree_shape_stat,

      Evolutionary_distinctiveness_sum,
      mean_Phylogenetic_isolation,
      variance_Phylogenetic_isolation,

      gamma,
      gamma_p_value,
      speciation_rate,
      extinction_rate,
      extinction_per_speciation,
      speciation_minus_extinction,
      trait_1_speciation,
      trait_2_speciation ,
      trait_1_extinction ,
      trait_2_extinction ,
      transition_from_trait_1_to_2 ,
      transition_from_trait_2_to_1 ,
      transition_rate_ratio_1to2_over_2to1 ,
      Phylogenetic_signal,
      spatial.tests.fora,
      spatial.tests.dom,
      prevalence
    )
    rownames(results_summary_matrix_1) <- 1

    results_summary_matrix_2 <- cbind(
      c(Evolutionary_distinctiveness,NA),
      lineages_through_time,
      time_steps
    )
    colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness",
                                            "lineages_through_time", "time_steps")
    head(results_summary_matrix_2)

    ### Returns from function in list form
    returns <- list(
      #Branch_Lengths,
      #Pairwise_dist,
      results_summary_matrix_1,
      results_summary_matrix_2

    )

    names(returns) <- c(
      #"Branch_Lengths",
      #"Pairwise_distance",
      "results_summary_of_single_value_outputs",
      "results_summary_matrix_of_multi_value_outputs"
    )
    cat("] 100%")

    return(returns)

  }
}


#Module_2(myOut)

References

setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output")

details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data
str(fit)
fit$confusion
as.vector(fit$confusion)
as.vector(fit$test$confusion)
fit$importance[,5]
fit$importanceSD[,5]


str(fit)
fit$confusion
plot(fit$err.rate[,1])
lines(fit$err.rate[,2])
lines(fit$err.rate[,3])
lines(fit$err.rate[,4])
lines(fit$err.rate[,5])
library(png)
## First consolidate the available files into a single table
    
      path <- "~/Box Sync/Four model compare/Module 2"
           
     
           setwd(path)
    myfiles_full <- list.dirs()
    analyze_this_many <- length(myfiles_full)
    
    available_files <- matrix(NA, 1, 1)
    
        
    for(i in 1: analyze_this_many){
    available_files <- rbind(available_files , as.matrix(list.files(myfiles_full[i], full.names = TRUE)))
    }
    dim(available_files)
    
    split.file.name <- strsplit(available_files[10], split = "_") 
    
    
    
 
available <- list.files()
files <- matrix(rep(NA, 62), length(available), 62)
dim(files)
i <- 10


for(i in 1:length(available)){
load(available[i])
name <- unlist(strsplit(available[i], split="_"))
files[i,] <- c(as.vector(matrix(name, 1,35)),matrix(Sim_statistics[[1]], 1, 27))

}


colnames(files) <-  c(

    NA,
    "background_takeover_type" ,
    NA,
    "replicate",
    NA,
    "Model_type",
    rep(NA,2),
    "speciation_of_Env_NonD",
    "speciation_of_Env_D",
    "speciation_of_For",
    "speciation_of_Dom",
    NA,
    "extinction_of_Env_NonD",
    "extinction_of_Env_D",
    "extinction_of_For",
    "extinction_of_Dom",
    NA,
    "P.diffusion_Target_forager",
    "P.diffusion_Target_domesticator",
    "P.diffusion_Source_forager",
    "P.diffusion_Source_domesticator",
    NA,
    "P.takeover_Target_forager",
    "P.takeover_Target_domesticator",
    "P.takeover_Source_forager",
    "P.takeover_Source_domesticator",
    NA,
    "arisal_of_Env_NonD",
    "arisal_of_Env_D",
    "arisal_of_For",
    "arisal_of_Dom",
    
    NA, 
    "timesteps", 
    NA,
        
    "number_of_branches",
    "Pylo_diversity_is_sum_of_BL",
    "average_phylogenetic_diversity_is_mean_of_BL",
    "variance_Pylo_diversity_is_variance_of_BL",

    "F_quadratic_entropy_is_sum_of_PD",
    "Mean_pairwise_distance",
    "variance_pairwise_distance",

    "Evolutionary_distinctiveness_sum",
    "mean_Phylogenetic_isolation",
    "variance_Phylogenetic_isolation",

    "gamma",
    "gamma_p_value",
    "speciation_rate",
    "extinction_rate",
    "extinction_per_speciation",
    "speciation_minus_extinction",
    "trait_1_speciation",
    "trait_2_speciation" ,
    "trait_1_extinction" ,
    "trait_2_extinction" ,
    "transition_from_trait_1_to_2" ,
    "transition_from_trait_2_to_1" ,
    "transition_rate_ratio_1to2_over_2to1" ,
    "Phylogenetic_signal",
    "spatial.tests.fora",
    "spatial.tests.dom",
    "prevalence"
    
    
  )

results_table <- as.data.frame(files)
head(results_table)
dim(results_table)
Concatenated_data <- results_table
save(Concatenated_data, file="~/Desktop/Four_model_compare_results.Rdata")

one <- subset(results_table, Model_type=="01" )
two <- subset(results_table, Model_type=="02" )
three <- subset(results_table, Model_type=="03" )
four <- subset(results_table, Model_type=="04" )
crop <- min(length(one[,1]),
length(two[,1]),
length(three[,1]),
length(four[,1]))
one <- one[1:crop,]
two <- two[1:crop,]
three <- three[1:crop,]
four <- four[1:crop,]

Concatenated_data <- rbind(one, two, three, four)
dim(Concatenated_data)





save(Concatenated_data, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries/Four_model_compare_results", format(Sys.time(), format="%d_%b_%Y"),"_crop_to_", crop,".Rdata"))
crop
## First consolidate the available files into a single table
    
      path <- "~/Box Sync/Four model compare/Module 2 extinct"
           
     
           setwd(path)
    myfiles_full <- list.dirs()
    analyze_this_many <- length(myfiles_full)
    
    available_files <- matrix(NA, 1, 1)
    
        
    for(i in 1: analyze_this_many){
    available_files <- rbind(available_files , as.matrix(list.files(myfiles_full[i], full.names = TRUE)))
    }
    dim(available_files)
    
    split.file.name <- strsplit(available_files[10], split = "_") 
    
    
    
 
available <- list.files()
files <- matrix(rep(NA, 62), length(available), 62)
dim(files)
i <- 10


for(i in 1:length(available)){
load(available[i])
name <- unlist(strsplit(available[i], split="_"))
files[i,] <- c(as.vector(matrix(name, 1,35)),matrix(Sim_statistics[[1]], 1, 27))

}


colnames(files) <-  c(

    NA,
    "background_takeover_type" ,
    NA,
    "replicate",
    NA,
    "Model_type",
    rep(NA,2),
    "speciation_of_Env_NonD",
    "speciation_of_Env_D",
    "speciation_of_For",
    "speciation_of_Dom",
    NA,
    "extinction_of_Env_NonD",
    "extinction_of_Env_D",
    "extinction_of_For",
    "extinction_of_Dom",
    NA,
    "P.diffusion_Target_forager",
    "P.diffusion_Target_domesticator",
    "P.diffusion_Source_forager",
    "P.diffusion_Source_domesticator",
    NA,
    "P.takeover_Target_forager",
    "P.takeover_Target_domesticator",
    "P.takeover_Source_forager",
    "P.takeover_Source_domesticator",
    NA,
    "arisal_of_Env_NonD",
    "arisal_of_Env_D",
    "arisal_of_For",
    "arisal_of_Dom",
    
    NA, 
    "timesteps", 
    NA,
        
    "number_of_branches",
    "Pylo_diversity_is_sum_of_BL",
    "average_phylogenetic_diversity_is_mean_of_BL",
    "variance_Pylo_diversity_is_variance_of_BL",

    "F_quadratic_entropy_is_sum_of_PD",
    "Mean_pairwise_distance",
    "variance_pairwise_distance",

    "Evolutionary_distinctiveness_sum",
    "mean_Phylogenetic_isolation",
    "variance_Phylogenetic_isolation",

    "gamma",
    "gamma_p_value",
    "speciation_rate",
    "extinction_rate",
    "extinction_per_speciation",
    "speciation_minus_extinction",
    "trait_1_speciation",
    "trait_2_speciation" ,
    "trait_1_extinction" ,
    "trait_2_extinction" ,
    "transition_from_trait_1_to_2" ,
    "transition_from_trait_2_to_1" ,
    "transition_rate_ratio_1to2_over_2to1" ,
    "Phylogenetic_signal",
    "spatial.tests.fora",
    "spatial.tests.dom",
    "prevalence"
    
    
  )

Concatenated_data <- as.data.frame(files)
head(Concatenated_data)
dim(Concatenated_data)

save(Concatenated_data, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries/Four_model_compare_results_extinct_", format(Sys.time(), format="%d_%b_%Y"),"_crop_to_", crop,".Rdata"))
load('~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries/Four_model_compare_results_02_Mar_2017_crop_to_3481.Rdata')
extant <- Concatenated_data
extant
setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries")
details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data

trimmed_details <- details[which(list.files() != list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extant <- Concatenated_data
dim(extinct)
dim(extant)

for(i in c(9,10,11,12,14,15,16,17,19,20,21,22,24,25,26,27,29,30,31,32)){
    extinct[which(is.nan(as.numeric(as.character(extinct[, i]))) == TRUE), i] <- NA
}

for(i in c(9,10,11,12,14,15,16,17,19,20,21,22,24,25,26,27,29,30,31,32)){
    extant[which(is.nan(as.numeric(as.character(extant[, i]))) == TRUE), i] <- NA
}

i <- 19
for(i in c(20,21,24,25,26,27)){
    extinct[which(as.numeric(as.character(extinct[, i])) == 0), i] <- NA
}

for(i in c(20,21,24,25,26,27)){
    extant[which(as.numeric(as.character(extant[, i])) == 0), i] <- NA
}


xlimit <- c(0,1)
ylimit <- c(0,600)
maincex <- 0.9

png(file="Global_success_rate_per_parameter.png", width=8.5, height=11, units="in", res=300)

par(mfrow=c(5,4), mar=c(3,3,3,0))


hist(as.numeric(as.character(extinct[,9])), main="speciation of F in F env", col=adjustcolor("firebrick", alpha=0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,9])), main="speciation of F in F env", col=adjustcolor("cornflowerblue", alpha=0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)


hist(as.numeric(as.character(extinct[,10])), main="speciation of D in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,10])), main="speciation of D in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)


hist(as.numeric(as.character(extinct[,11])), main="speciation of F in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,11])), main="speciation of F in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[,12])), main="speciation of D in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,12])), main="speciation of D in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

#######

hist(as.numeric(as.character(extinct[, 14])), main="extinction of F in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 14])), main="extinction of F in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 15])), main="extinction of D in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 15])), main="extinction of D in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 16])), main="extinction of F in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 16])), main="extinction of F in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 17])), main="extinction of D in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 17])), main="extinction of D in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

######

hist(as.numeric(as.character(extinct[, 29])), main="arisal of F in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 29])), main="arisal of F in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 30])), main="arisal of D in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 30])), main="arisal of D in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 31])), main="arisal of F in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 31])), main="arisal of F in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 32])), main="arisal of D in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 32])), main="arisal of D in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

######

hist(as.numeric(as.character(extinct[, 19])), main="NOPE -- Diffusion: source F, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex)
hist(as.numeric(as.character(extant[, 19])), main="NOPE -- Diffusion: source F, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 20])), main="Diffusion: source D, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 20])), main="Diffusion: source D, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 21])), main="Diffusion: source F, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 21])), main="Diffusion: source F, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 22])), main="NOPE -- Diffusion: source D, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex)
hist(as.numeric(as.character(extant[, 22])), main="NOPE -- Diffusion: source D, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex, add=TRUE)

####

hist(as.numeric(as.character(extinct[, 24])), main="Takeover: source F, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 24])), main="Takeover: source F, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 25])), main="Takeover: source D, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 25])), main="Takeover: source D, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 26])), main="Takeover: source F, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 26])), main="Takeover: source F, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 27])), main="Takeover: source D, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 27])), main="Takeover: source D, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)


dev.off()



png(file="extiction minus extant per outcome.png", width=8.5, height=11, units="in", res=300)
par(mfrow=c(3,1))

plot(as.numeric(as.character(extinct[,9])), as.numeric(as.character(extinct[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("firebrick", alpha=0.2), pch=19, cex=0.6, ylim=c(0,1))
plot(as.numeric(as.character(extant[,9])), as.numeric(as.character(extant[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("cornflowerblue", alpha=0.2), pch=19, cex=0.6, ylim=c(0,1))

plot(as.numeric(as.character(extinct[,9])), as.numeric(as.character(extinct[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("firebrick", alpha=0.2), pch=19, cex=0.6, ylim=c(0,1))
points(as.numeric(as.character(extant[,9])), as.numeric(as.character(extant[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("cornflowerblue", alpha=0.2), pch=19, cex=0.6)


dev.off()

params <- extant[,-1:-35]
names(params)
break_number <- 10
xmin <-
xmax <- 

x1 <- as.numeric(params[,1])
h1 <- hist(x1,  plot=FALSE, breaks= break_number)
xfit1 <- seq(xmin, xmax, length= 100)
yfit1 <- dnorm(xfit1, mean=mean(x1), sd=sd(x1))
yfit1 <- yfit1*diff(h1$mids[1:2])*length(x1)+101.5
polygon(xfit1, yfit1, col=adjustcolor("limegreen", alpha=0.5), lwd=2, border=adjustcolor("limegreen", alpha=0.6))

setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries")
details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data

trimmed_details <- details[which(list.files() != list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extant <- Concatenated_data



load('~/Box Sync/colliding ranges/Simulations_humans/Available trees/real.analysis.RData')

Concatenated_data <- extant
head(Concatenated_data)
#Concatenated_data <- Concatenated_data[Concatenated_data[, 2] == "stats.no.bTO", ]
#Concatenated_data <- Concatenated_data[Concatenated_data[, 6] != "05", ]
# Concatenated_data[, 6] <- as.numeric(Concatenated_data[, 6])
# # Concatenated_data[original[, 2] == "background_takeover", 6] <-  Concatenated_data[original[, 2] == "background_takeover", 6] + 4
Concatenated_data[, 6] <- factor(Concatenated_data[, 6])
#head(Concatenated_data)
#names(Concatenated_data)

PCAdata <- Concatenated_data[, -(1:35)]
PCAdata <- PCAdata[, -12]
PCAdata <- apply(PCAdata, 2, as.numeric)
remove <- apply(is.na(PCAdata), 1, any)
PCAdata <- PCAdata[!remove, ]

# Predictions
library(randomForest)

data.analysis.comp2 <- data.frame("Model" = as.factor(Concatenated_data[!remove, 6]),
                                  PCAdata)
data.analysis.comp2$sprate <- data.analysis.comp2$trait_1_speciation/data.analysis.comp2$trait_2_speciation
data.analysis.comp2$extrate <- data.analysis.comp2$trait_1_extinction/data.analysis.comp2$trait_2_extinction


#load("Real_phy/real.analysis.RData")
a <- as.data.frame(real.analysis$results_summary_of_single_value_outputs)
a$sprate <- a$trait_1_speciation / a$trait_2_speciation
a$extrate <- a$trait_1_extinction / a$trait_2_extinction

data.analysis.comp3 <- data.analysis.comp2[, -c(2, 13:14, 16:20)]
#data.analysis.comp3 <- data.analysis.comp3[data.analysis.comp3$Model %in% 1:4, ]
#data.analysis.comp3$Model <- factor(data.analysis.comp3$Model)
#sub <- unlist(lapply(as.list(c(1:4)), function(x, y) {
#  sample(which(y$Model == x), min(table(data.analysis.comp3$Model)))},
#  y = data.analysis.comp3))
# data.analysis.comp3 <- data.analysis.comp3[sub, ]
fun <- function(x, y, per = .33) {sample(which(y$Model == x), round(table(y$Model)[1]*per))}

sub.test <- unlist(lapply(as.list(paste0(0, c(1:4))), fun,
                          y = data.analysis.comp3))
test2 <- data.analysis.comp3[sub.test, 2:ncol(data.analysis.comp3)]
test1 <- data.analysis.comp3[sub.test, 1]
train <- data.analysis.comp3[-sub.test, ]

train[, -1] <- apply(train[, -1], 2, as.numeric)
test2 <- as.data.frame(apply(test2, 2, as.numeric))
infinites <- which(apply(train[, -1], 2, is.infinite), arr.ind=T)
if (nrow(infinites) > 0) {
train <- train[-infinites[, 1], ]
}
infinites2 <- which(apply(test2, 2, is.infinite), arr.ind=T)
if (nrow(infinites2) > 0) {
test2 <- test2[-infinites2[, 1], ]
test1 <- test1[-infinites2[, 1]]
}


for(i in 1:100){
(fit <- randomForest(Model ~ ., data=train, xtest = test2, ytest = test1, 
                    importance=TRUE, ntree=1000, keep.forest = TRUE, replace=FALSE))

predictions <- predict(fit, 
                       a,
                       type="prob")
predictions

save(fit, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output/RF_daily_output_", format(Sys.time(), format="%d_%b_%Y"), "_",i, "_NoREPLACEMENT_.Rdata"))
}


save(fit, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output/RF_daily_output_", format(Sys.time(), format="%d_%b_%Y"),".Rdata"))

plot(fit, ylim=c(0,1))
labs <- c("Basic", "+Diffusion", "+Takeover", "+Diffusion +Takeover")
# bar plot
png("Prob_aus.png", width = 25, height = 25, res = 300, units = "in")
par(mar = c(8, 8, 1, 1))
pred <- setNames(as.numeric(predictions), labs)
cols <- rev(c("darkgreen", "red", "blue", "darkorange1"))
barplot(pred, col = cols, ylab = "Proability", cex.lab = 3, cex.names = 2)
dev.off()
# Plot confusion matrix
png("Conffusion_matrix_all.png", width = 25, height = 25, res=300, units="in")
par(mar = c(10, 11, 1, 1))
colors1 <- colorRampPalette(colors = c("#f0f0f0", "#bdbdbd","#636363"))
prop <- apply(fit$confusion[, -5], 2, function(x){x / sum(x)}) * 100

image(prop, col = colors1(20), axes=FALSE)
axis(1, at=c(0, .33, .66, 1), labels=labs, tick = FALSE, line = FALSE, cex.axis = 3.5, pos = -.19)
axis(2, at=c(0, .33, .66, 1), labels=labs, tick = FALSE, line = FALSE, cex.axis = 3.5)
mtext("ACTUAL", side = 1, padj = 3, cex = 4)
mtext("PREDICTED", side = 2, padj = -3, cex = 4)

for(i in 1:4) {
  for(j in 1:4) {
    text(x = c(0, .33, .66, 1)[i], y = c(0, .33, .66, 1)[j], paste0(round(prop[i, j], 2), "%"),
         cex = 5)
  }
}
dev.off()
importance(fit)
# Variables importance

imp <- importance(fit)
imp <- apply(imp, 2, function(x) (x - min(x))/(max(x) - min(x)))
imp <- imp[sort(imp[, 5], index.return = TRUE, decreasing = TRUE)$ix, ]


names <- rownames(imp)
names[names == "spatial.tests.fora"] <- "Space F"
names[names == "spatial.tests.dom"] <- "Space D"
names[names == "sprate"] <- "Sp(ratio)"
names[names == "transition_from_trait_1_to_2"] <- "TR(1-2)"
names[names == "transition_from_trait_2_to_1"] <- "TR(2-1)"
names[names == "Phylogenetic_signal"] <- "PhySig(D)"
names[names == "Evolutionary_distinctiveness_sum"] <- "EDsum"
names[names == "Pylo_diversity_is_sum_of_BL"] <- "PDsum"
names[names == "transition_rate_ratio_1to2_over_2to1"] <- "TR(ratio)"
names[names == "gamma"] <- "Gamma"
names[names == "mean_Phylogenetic_isolation"] <- "MPI"
names[names == "extrate"] <- "Ext(ratio)"
names[names == "average_phylogenetic_diversity_is_mean_of_BL"] <- "PDmean"
names[names == "extinction_per_speciation"] <- "DR"
names[names == "variance_Phylogenetic_isolation"] <- "VPI"
names[names == "F_quadratic_entropy_is_sum_of_PD"] <- "F"
names[names == "Mean_pairwise_distance"] <- "MPD"
names[names == "variance_Pylo_diversity_is_variance_of_BL"] <- "PDvar"
names[names == "variance_pairwise_distance"] <- "VPD"


png("var_import_all.png", width = 25, height = 25, unit="in", res=300)
par(mar = c(10, 18, 1, 1))
plot(x = rev(imp[, 5]), y = 1:nrow(imp), type = "l", yaxt = "n", 
     ylab = "", xlab = "Variable Importance",
     xlim = c(0, 1), lwd = 2, cex.lab = 4)
for (i in 1:nrow(imp)) {
  abline(h = i, lty = 3, col = "gray80")
}
abline(v = seq(0, 1, 1/19), lty = 3, col = "gray80")

lines(x = rev(imp[, 4]), y = 1:nrow(imp), col = "darkgreen", lwd = 2)
lines(x = rev(imp[, 3]), y = 1:nrow(imp), col = "red", lwd = 2)
lines(x = rev(imp[, 2]), y = 1:nrow(imp), col = "blue", lwd = 2)
lines(x = rev(imp[, 1]), y = 1:nrow(imp), col = "darkorange1", lwd = 2)
lines(x = rev(imp[, 5]), y = 1:nrow(imp), lwd = 3)

points(x = rev(imp[, 4]), y = 1:nrow(imp), col = "darkgreen", cex = 2)
points(x = rev(imp[, 3]), y = 1:nrow(imp), col = "red", cex = 2)
points(x = rev(imp[, 2]), y = 1:nrow(imp), col = "blue", cex = 2)
points(x = rev(imp[, 1]), y = 1:nrow(imp), col = "darkorange1", cex = 2)
points(x = rev(imp[, 5]), y = 1:nrow(imp), pch = 20, cex = 3)


text(y = 1:nrow(imp), x = par("usr")[1] - .17, labels = rev(names),
     srt = 0, pos = 4, xpd = T, cex = 4)
dev.off()
par(mfrow=c(2,3))

# Box plots
boxplot(spatial.tests.fora ~ Model, data = data.analysis.comp3)
abline(h = a$spatial.tests.fora, col = "red", lty = 2)

boxplot(spatial.tests.dom ~ Model, data = data.analysis.comp3)
abline(h = a$spatial.tests.fora, col = "red", lty = 2)

boxplot(log(sprate) ~ Model, data = data.analysis.comp3, ylim = c(-10, 10))
abline(h = log(a$sprate), col = "red", lty = 2)

boxplot(log(extrate) ~ Model, data = data.analysis.comp3, ylim = c(-10, 10))
abline(h = log(a$extrate), col = "red", lty = 2)

boxplot(log(transition_rate_ratio_1to2_over_2to1) ~ Model, data = data.analysis.comp3)
abline(h = log(a$sprate), col = "red", lty = 2)

boxplot(Phylogenetic_signal ~ Model, data = data.analysis.comp3, ylim = c(0, 1))
abline(h = a$Phylogenetic_signal, col = "red", lty = 2)
#build a data tracking table to track parameter changes through time

str(fit)

y
setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output")

details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data
str(fit)
fit$confusion
as.vector(fit$confusion)
as.vector(fit$test$confusion)
fit$importance[,5]
fit$importanceSD[,5]


str(fit)
fit$confusion
plot(fit$err.rate[,1])
lines(fit$err.rate[,2])
lines(fit$err.rate[,3])
lines(fit$err.rate[,4])
lines(fit$err.rate[,5])
load('~/Downloads/FULL_TREE_Society_data_with_binary_conversions.Rdata')

load('~/Downloads/Tree_FULL_trimmed.Rdata')

this_tree <- full_tree 


load('~/Downloads/download.Rdata')
objects()
str(myOut)

this_tree <- myOut$mytree
this_world <- myOut$myWorld

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


stop <- function(){


library(ape)
library(phytools)
library(diversitree)
library(spdep)
library(Rcpp)
library(msm)
library(FARM)
options(repos = c(CRAN = "http://cran.rstudio.com"))
install.packages("devtools")
library(devtools)
install_github("BrunoVilela/FARM")
library(FARM)

load('~/Downloads/FULL_TREE_Society_data_with_binary_conversions.Rdata')

load('~/Downloads/Tree_FULL_trimmed.Rdata')

this_tree <- full_tree 


load('~/Downloads/download.Rdata')
objects()
str(myOut)


this_world <- myOut$myWorld

## This module analyzes the results from module 1 and returns a list based on how many values each stat returns
## Ty Tuff and Bruno Vilela
## 24 August 2016

###### Specify function ##############################

    #  this_tree <- Module_1_output$mytree
    #  this_world <- Module_1_output$myWorld

      ##### (0) Pull necessary variables from simulated trees and organize into a single object for all the tests below to pull from.

      #str(all_trees)
      #str(this_tree)

      ## 0a) Branch lengths
      Branch_Lengths <- this_tree$edge.length
      number_of_branches <- length(Branch_Lengths)

      # Anchor test = PD (Faith's phylogenetic diversity)
      Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)

      # avPD -- Average phylogenetic diversity
      average_phylogenetic_diversity_is_mean_of_BL <- mean(Branch_Lengths)

      variance_Pylo_diversity_is_variance_of_BL <- var(Branch_Lengths)
      cat("-")
      ## 0b) Pairwise distance between tips
      Pairwise_dist <- cophenetic(this_tree)
      cat("-")
      # 2b) Pairwise distance -- Sum of pairwise distances

      # F -- Extensive quadratic entropy
      F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)

      #Mean inter-species distances

      # Anchor test = MPD (mean pairwise distance)

      Mean_pairwise_distance <- mean(Pairwise_dist)

      cat("-")
      #Pairwise distance/all distances -- Variance of pairwise distances

      # Anchor test = VPD (variation of pairwise distance)

      variance_pairwise_distance <- var(as.vector(Pairwise_dist))

      ## 0c) Phylogenetic isolation

      # Using equal.splits method, faster computation
      Evolutionary_distinctiveness <- evol.distinct2(this_tree, type = "equal.splits")
      cat("-")
      # ED - Summed evolutionary distinctiveness

      Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness)

      ## 3d) Phylogenetic isolation -- Mean of species evolutionary distinctiveness

      # mean(ED)

      mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness)

      ## 4d) Phylogenetic isolation -- Variance of species isolation metrics

      #var(ED)

      variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness)
      cat("-")

      ## Tree topology

      #Gamma index

      ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
      lineages_through_time <- as.numeric(ltts[[1]])
      time_steps <- as.numeric(ltts[[2]])
      gamma <- ltts[[3]]
      gamma_p_value <- ltts[[4]]
      cat("-")

      ##### (5) Tree metric -- Macroevolutionary - Rate and rate changes ###############
      ##################################################

      ## Speciation vs extinction rates and Net diversification
      bds <- bd(this_tree)
      speciation_rate <- bds[1]
      extinction_rate <- bds[2]
      extinction_per_speciation <- bds[3]
      speciation_minus_extinction <- bds[4]
      cat("-")

      ## Speciation vs extinction rates and Net diversification dependent on trait
      N.for.dom <- table(this_world[, 6])
      if(length(N.for.dom) == 2) {
        par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
        trait_1_speciation <- par.div.dep[1]
        trait_2_speciation <- par.div.dep[2]
        trait_1_extinction <- par.div.dep[3]
        trait_2_extinction <- par.div.dep[4]
        transition_from_trait_1_to_2 <- par.div.dep[5]
        transition_from_trait_2_to_1 <- par.div.dep[6]
        transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
        cat("-")

        ## Crown age per trait AUC and effect size
        tip.length <- this_tree$edge.length[this_tree$edge[, 2] %in% 1:Ntip(this_tree)]
        tip.length <- (tip.length - min(tip.length)) / (max(tip.length) - min(tip.length))
        this_trait <- this_world[match(this_tree$tip.label, this_world[, 8]), 6]
        tip.length.2 <- tip.length[this_trait == 2]
        tip.length.1 <- tip.length[this_trait == 1]
        model <- glm(as.factor(this_trait) ~ log(tip.length + 1),
                     family = "binomial")
        effect.size <- model$coefficients[2]
        plot(y = this_trait - 1, x= log(tip.length))
        p <- predict(model, as.factor(this_trait), type = "resp")
        points(y = p, x = log(tip.length), col = "red")
        pr <- prediction(p, as.factor(this_trait))
        auc.model <- performance(pr, measure = "auc")@y.values[[1]]

        ## Phylogenetic signal (D)
        Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)
        cat("-")

        ## Spatial Analysis
        nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
        nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
        nbs.listw <- nb2listw(nbs)
        factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
        spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
        spatial.tests.fora <- spatial.tests[[1]]$statistic
        spatial.tests.dom <- spatial.tests[[2]]$statistic
        prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
        cat("-")
      } else {
        trait_1_speciation <- NA
        trait_2_speciation <- NA
        trait_1_extinction <- NA
        trait_2_extinction <- NA
        transition_from_trait_1_to_2 <- NA
        transition_from_trait_2_to_1 <- NA
        transition_rate_ratio_1to2_over_2to1 <- NA
        Phylogenetic_signal <- NA
        spatial.tests.fora <- NA
        spatial.tests.dom <- NA
        auc.model <- NA
        effect.size <- NA
        prevalence <- ifelse(names(table(this_world[, 6])[1]) == "1", 1,
                             -1)
        cat("---")

      }

      results_summary_matrix_1 <- cbind(

        number_of_branches,
        Pylo_diversity_is_sum_of_BL,
        average_phylogenetic_diversity_is_mean_of_BL,
        variance_Pylo_diversity_is_variance_of_BL,

        F_quadratic_entropy_is_sum_of_PD,
        Mean_pairwise_distance,
        variance_pairwise_distance,

        Evolutionary_distinctiveness_sum,
        mean_Phylogenetic_isolation,
        variance_Phylogenetic_isolation,

        gamma,
        gamma_p_value,
        speciation_rate,
        extinction_rate,
        extinction_per_speciation,
        speciation_minus_extinction,
        trait_1_speciation,
        trait_2_speciation ,
        trait_1_extinction ,
        trait_2_extinction ,
        transition_from_trait_1_to_2 ,
        transition_from_trait_2_to_1 ,
        transition_rate_ratio_1to2_over_2to1 ,
        Phylogenetic_signal,
        spatial.tests.fora,
        spatial.tests.dom,
        prevalence,
        auc.model,
        effect.size
      )
      rownames(results_summary_matrix_1) <- 1

      results_summary_matrix_2 <- cbind(
        c(Evolutionary_distinctiveness,NA),
        lineages_through_time,
        time_steps
      )
      colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness", "lineages_through_time", "time_steps")
     # head(results_summary_matrix_2)

      ### Returns from function in list form
      returns <- list(
        #Branch_Lengths,
        #Pairwise_dist,
        results_summary_matrix_1,
        results_summary_matrix_2

      )

      names(returns) <- c(
        #"Branch_Lengths",
        #"Pairwise_distance",
        "results_summary_of_single_value_outputs",
        "results_summary_matrix_of_multi_value_outputs"
      )
      cat("] 100%")

      

    }
  


#Module_2(myOut)

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Nielsen et al. (2017)

Module 2: Space and phylogeny summary statistics

This is the master document for Module 2, a foundational function in our FARM package that analyzes results from Module 1, the other foundational function. Module 1 simulates a spatial pattern and a phylogenetic tree given a set of environmental and inheritance rules and then Module 2 summarizes those simulated results using a large set of targeted summary statistics. Here we describe our choice of summary statistics, justify those choices as part of a larger theoretical context, and provide our reproducable code for executing the anaylses yourself. These two parts are seperated into modules so that they can act independently. An combination of spatial pattern and associated phylogeny many be used as long as they are formatted correctly.

This pipeline was designed to analyze a simulated world where all the information is known about both the world and the tree. There is no missing information, just extinct trees. This is much different than our real tree that has loads of uncertainty unevenly destributed across it. The result you see demonstrated right now are one simulated result of many. I need to do a sister page to this were we do this entire analysis on the real tree, or best real tree we’ve got.

We have four types of data available for asking research questions using D-place data: phylogenies, spatial locations, trait identities, and environmental reconstructions. Any one of these four data types alone are relatively information poor, so we are searching for ways to model connections between these data types to draw stronger conclusions overall.

Other modules can use the summary statistics generated from this module to test hypotheses. We currently have a ABC and Random Forest module started but there will be more to come.

These are quantitative connections that we are assumed in the analyses, but we don’t actually have any support in the data for doing so. 1. nearest neighbor connectivity measures 2. Abundance estimates 3. Pairwise influence (history) between cultures. 4. Environmental reconstruction validation evidence

Phylogenetic summary statistics

Whole tree vs. part of tree? These statistics are generally used to compare one sample to another. For example, an experimental contrast between two sites, two phylogenetic groups, or two communities in two different locations. Here we are calculating these statistics for the global langauage tree to compare against global trees created in our simulation. You still retain the ability to subset this tree or others and send only those subsets through this code to compare the values with each other afterwards.

  1. Branch Length (richness and divergence)
  2. Pairwise distance between tips (richness, divergence, and regularity)
  3. Phylogenetic isolation (divergence, and regularity)
  4. Nearest Neighbor (divergence, and regularity)

All trees are ultrametric.

Introduction and framework

The choice of phylogenetic analyses and organizational scheme is based on the suggestions of Tucker et al. 2016. Here are a few images from that paper for an overview:

From Tucker et al. 2016. Biological Reviews From Tucker et al. 2016. Biological Reviews

From Tucker et al. 2016. Biological Reviews

From Tucker et al. 2016. Biological Reviews

From Tucker et al. 2016. Biological Reviews

From Tucker et al. 2016. Biological Reviews

#load('~/Downloads/FULL_TREE_Society_data_with_binary_conversions.Rdata')

load('~/Downloads/Tree_FULL_trimmed.Rdata')

this_tree <- full_tree 


load('~/Downloads/download.Rdata')

#str(myOut)

this_tree <- myOut$mytree
this_world <- myOut$myWorld

#str(this_world)
  #str(all_trees)
  #str(this_tree)


#library(knitr)
#kable(this_world, caption= "This is our world")

Alpha diversity metrics

Branch Lengths

Branch length data is embedded in the tree object provided to this function. The first step in summarizing the lengths is to extract those data from the tree object. These data are called ‘edges’ in the tree object. We extract branch lengths and create an object called ‘Branch_lengths’ for passing on to the other summary functions. The histogram below shows the frequency of different branch lengths found throughout the tree.

      Branch_Lengths <- this_tree$edge.length

We can summarize branch lengths according to normal summary statistics, but it can be difficult to assign evolutionary meaning to some of these metrics and so they are not regularly used as best I can tell. This lack of meaning does not mean that these statistics couldn’t be used to distinguish between large simulated trees.

      mean_branch_length <- mean(Branch_Lengths)
      variance_branch_length <- var(Branch_Lengths)
      SD_branch_length <- sd(Branch_Lengths)

Phylogenetic diversity (\(PD\))

Phylogenetic diversity (\(PD\)) is the summation (\(\sum\)) of all branch lengths connecting species together, where \(B_{t}\) is the set of included tips and \(L_{b}\) is Branch lengths. (Faith 1992) This is an anchor test, which means it is regularly used, well understood, and we should use it to anchor our work to past work. PD is a richness measure, it tells us how much evolutionary history is associated with a set of tips.

\[PD = \sum_{b \in B_{t}}^{}L_{b}\]


# Anchor test = PD (Faith's phylogenetic diversity)
      Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)
      Pylo_diversity_is_sum_of_BL

There are variations on this measure that we have NOT implemented here. It is popular to scale this measure according to some ecological driver. Barker (2002) scales branch lengths (\(L_{b}\)) by multiplying them against the abundance of individuals at at tip (\(A_{b}\)). Others (Rosauer et al. 2009), scale them by their range size instead (\(R_{b}\)).

\[\Delta n PD = \sum_{b \in B_{t}}^{}A_{b}L_{b}\] \[PE = \sum_{b \in B_{t}}^{}\dfrac{L_{b}}{R_{b}}\] Argueing that proportional abundance phylogenetic diversity (\(PD_{Ab}\)) is more effective than the standard PD calculated from raw abundance, Vellend et al. penned a new version of PD where \(B\) is the total number of branch lengths (\(L_{b}\)). Note: We don’t have abundance data right now for the human project so this metric is not currently very helpful.
\[PD_{Ab} = B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}\]

      #Calculate B
      number_of_branches <- length(Branch_Lengths)
      number_of_branches

Average phylogenetic diversity (\(avPD\))

Average phylogenetic diversity (\(avPD\)) (introduced by Clarke and Warwick 2001) is a branch length-based divergence indices where PD is divided by the total number of tips (\(S\)) in the tree. \[avPD = \dfrac{PD}{S}\]

      Number_of_tips <- length(this_tree$tip.label)
      average_phylogenetic_diversity <- Pylo_diversity_is_sum_of_BL / Number_of_tips
      average_phylogenetic_diversity

There is also a proportional abundance version of average phylogenetic diversity (\(avPD_{Ab}\)) (Tucker et al. 2016). Again, we don’t have abundance values yet for D-place. \[avPD_{Ab} = \dfrac{B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}}{S}\]

Pairwise distance between tips

This is the patristic distance, the sum of the branch lengths following the shortest distance between two tips in a tree, implemented as a distance matrix where every tip is compared to every other tip. This distance function can be anything. We use euclidean and environmental distance matrices heavily in the spatial analyses.

Calculate the patristic distance between two taxa, for all taxa

Calculate the patristic distance between two taxa using the R package ‘phytools’, this function takes a ‘phylo’ tree object and returns a distance matrix between tips. Need original citation.

    library(phytools)
  
    ls("package:phytools")
 ## Pairwise distance between tips - From library(ape) in library(phytools)
    Pairwise_dist <- cophenetic(this_tree)

yields a distance matrix (list of 2D matrices) of all distances between taxa.

Sum of all pairwise distances (\(F\))

Now we can use a set of summary statistics to describe those pairwise distances. The sum of all pairwise distances, \(F\), is formally called ‘Extensive quadratic entropy’. (Izsak and Papp (2000) and Szeidl (2002)). Just as it was with branch lengths, this is a richness measure and, accordingly, should be used to answer richness questions.

\[F = \sum_{i} \sum_{j} d_{ij}\]

      # F -- Extensive quadratic entropy
      F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)

Mean pairwise distance (MPD)

Mean inter-species distances. The mean of all pairwise distances, \(MPD\) (a.k.a. \(AvTD\), and \(\Delta^{+}\)), is the mean distance between species. (Clarke and Warwick (1998), Webb et al. (2002, 2008), Kembel et al. (2010)) \[MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}\]

      # Anchor test = MPD (mean pairwise distance)

      Mean_pairwise_distance <- Pairwise_dist / (Number_of_tips * (Number_of_tips - 1) ) 
      

MPD anchored to the root

There is an extention to mean pairwise distance calculations from Helmus et al. (2007) called \(PSV\), \(PSR\), and \(PSE\), phylogenetic species variability, phylogenetic species richness, and phylogenetic species evenness. These measures take the basic pairwise distance calculations and anchor them to the root of the tree so distances have a common denominator. This extention is implemented by using the same equations, just with a constrained set of \(d_{ij}\) conditions. Specifically,

\[PSV = MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}\] \[PSR = \sum_{i} {(\dfrac{1}{S-1} \sum_{j} {d_{ij}})}\] \[PSE = \dfrac{S}{S-1} \sum_{ij} d_{ij}p_{i}p_{j}\]

with these specific values of \(d_{ij}\)

\[d_{ji}=0.5*(c_{ii} + c_{jj} - c_{ij}) \\ \ \\ or \\ \ \\ d_{ij} = 1 - c_{ij} / (\sqrt{c_{ii}c_{jj}}) \] and \[ c_{ii} = the \ sum \ of \ branch \ lengths \ from \ tip \ i \ to \ the \ root \ of \ the \ phylogenetic \ tree. \\ \ \\ c_{ij} = the \ sum \ of \ branch \ lengths \ from \ first \ common \ ancestor \ for \ i \ and \ j \ to \ the \ root. \]

Average distance between two randomly chosen species

\(J\), Intensive quadratic entropy, which is the average distance between two randomly chosen species. (Izsak and Papp 2000) \[J = \dfrac{\sum_{ij}d_{ij}}{S^2} \]

Simpson’s diversity index for pairwise distance

There has been a long effort to pen a phylogenetic analogy to a Simpson’s diversity index. (see Rao (1982), Clarke and Warwick (1998), Pavoine et al. (2005), Hardy and Senterre (2007), Webb et al. (2002, 2008), Kembel et al. (2010)). The conclusion seems to be that this measure is equivilent to scaling \(MPD\) by abundance \(p_{i}\) and \(p_{j}\) to get \(MPD_{Ab}\). This is also a special case of Rao’s Quadratic Entropy, \(Roe's QE\). Note: not using abundance measures yet for D-place data.

\[MPD_{Ab} = \sum_{i} \sum_{j} d_{ij} p_{i} p_{j}\]

Interspecific comparisons of pairwise distances

The interspecific variant (rather than the intraspecific default described above) defines the expected phylogenetic distance between two indivdiuals randomely drawn conditionally on the fact that they indivdulas from different species. \[InterMPD_{Ab} = \dfrac{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j} }{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j}} \]

Variance in pairwise distances (\(VPD\))

Variance in pairwise distances, \(VPD\) (a.k.a. \(VarTD\) and \(\Lambda^+\)), is a regularity indices. Clarke and Warwick (2001) Variance is relative to tips, \(S\), not to total branches (\(B\) from above). These are the residuals, they compare each individual pairwise connection to the overall mean.

\[VPD = \dfrac{1}{S(S-1)} (\sum_{i} \sum_{j \ne i} {(d_{ij} - MPD)^2})\]

    #Pairwise distance/all distances -- Variance of pairwise distances

      # Anchor test = VPD (variation of pairwise distance) #need to adjust to equation above. 

      variance_pairwise_distance <- var(as.vector(Pairwise_dist))

Variants of \(VPD\) are \(VPD_{ab}\) and \(InterPVD_{Ab}\), where variance is scaled by abundance or compared in and out of species. These are also regularity indices. \[ VPD_{Ab}= (\sum_{i} \sum_{j} n_{i} n_{j}) * \dfrac{\sum_{i} \sum_{j} n_{i} n_{j} (d_{ij} - MPD_{Ab})^2} {(\sum_{i} \sum_{j} n_{i} n_{j})^2 - \sum_{i} \sum_{j} (n_{i} n_{j})^2} \\ or \\ InterVPD_{Ab} = (\sum_{i} \sum_{j \ne i} n_{i} n_{j}) * \dfrac{\sum_{i} \sum_{j \ne i} n_{i} n_{j} (d_{ij} - InterMPD_{Ab})^2} {(\sum_{i} \sum_{j \ne i} n_{i} n_{j})^2 - \sum_{i} \sum_{j \ne i} (n_{i} n_{j})^2} \]

Nearest phylogenetic neighbor

Divergence indices

Divergence indices using nearest distance: \(MNTD\) and \(MNTD_{Ab}\), Mean nearest taxon distance and Abundance-weighted MNTD. (Webb et al. (2002,2008), Kembel et al. (2010)).

\(MNTD\), mean nearest taxon distance, is the mean shortest distance from a species to all other in the assemblage. Webb et al (2002, 2008), Kembel et al. (2010)
\[ MNTD = \dfrac{1}{S} \sum_{i} d_{i_{min}} \]

\(MNTD_{AB}\), abundance adjusted mean nearest taxon distance. Adjusted by species proportions (i.e. species’ relative abundances) Webb et al (2002, 2008), Kembel et al. (2010)
\[ MNTD_{Ab} = \sum_{i=1}^{S} [d_{i_{min}} * p_{i}] \]

Regularity indices

Regularity indices using nearest distances: \(VNTD\), \(VNTD_{Ab}\), \(PE_{ev}\).

\(VNTD\), Variance in nearest taxon distances, is the variance in nearest pairwise distance. Tucker et al. (2016) \[ VNTD = \dfrac{1}{S} \sum_{i-1}^{S} [(d_{i_{min}} - MNTD)^2] \] \(VNTD_{Ab}\), Abundance weighted variance in nearest taxon distances, is scales by abundance in the same way as descried above. Tucker et al. (2016) \[ VNTD_{Ab} = \dfrac {(\sum_{i} n_{i}) \sum_{i} n_{i} (d_{i_{min}} - MNTD_{Ab})^2} {(\sum_{i} n_{i})^2 - \sum_{i} n_{i} ^2} \]

Phylogenetic version of the funtional \(FE_{ve}\) index

\(PE_ve\), phylogenetic evenness is a phylogenetic version of the funtional \(FE_{ve}\) index. First a minimu spannin tree (MST) is computed using the cophenetic distance obtained from the phylogenetic tree. The MST contains S-1 Branches connection the S species. We denote l a branch on the MST, dist(i,j) is the lengththe branch l that connects species i and j. ni is, as defined abouve, the abundce of species i in the asseblage. Villeger et al. (2008), Dehling et al. (2014).

\[ Weighted \ evenness: \\ EW_{i} = \dfrac{dist(i,j)} {(n_{i} + n_{j})/(\sum_{k=1}^{S}n_{k})} \\ \ \\ Partial \ weighted \ evenness: \\ PEW_{l} = \dfrac {EW_{l}} {\sum_{l=1}^{S-1} EW_{l}} \\ \ \\ Phylogenetic \ evenness: \\ PE_{ve} = \dfrac {\sum_{l=1}^{S-1} min(PEW_{l}, \dfrac{1}{S-1}) - (\dfrac{1}{S-1})} {1- (\dfrac{1}{S-1})} \]

Phylogenetic isolation

A phylogenetic isolation index represents the relative isolation of a given species within a phylogenetic tree. Several indices have been proposed so far but we focus here on the evolutionary distinctiveness index called ‘Fair Proportion’ as proposed by Redding (2003) and Isaac (2007).

Evolutionary distinctiveness (richness indices)

\(ED\), evolutionary distinctiveness is a richness indices. NOTE: not equal to Faith’s PD because the \(ED_{i}\) are computed from the regional pool of species and sumed across a given assemblage (i.e. a subset of the regional species pool). Tucker (2016), Safi et al. (2013), Redding (2003), and Isaac (2007)

\[ ED = \sum_{i}ED_{i} \\ \ \\ where \ ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}} \]

\(AED\), Abundance-weighted \(ED\). Tucker (2016) and Cadotte et al. (2010) \[ \sum_{i} AED_{i} \\ \ \\ where \ AED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{A_{b} }* p_{i} \]


# Bruno's function for ED. Provided in library(FARM)

evol.distinct2 <- function (tree, type = c("equal.splits", "fair.proportion"), 
    scale = FALSE, use.branch.lengths = TRUE) 
{
    type <- match.arg(type)
    if (is.rooted(tree) == FALSE) 
        warning("A rooted phylogeny is required for meaningful output of this function", 
            call. = FALSE)
    if (scale == TRUE) {
        if (is.ultrametric(tree) == TRUE) 
            tree$edge.length <- tree$edge.length/(as.numeric(branching.times(tree)[1]))
        else tree$edge.length <- tree$edge.length/sum(tree$edge.length)
    }
    if (use.branch.lengths == FALSE) 
        tree$edge.length <- rep(1, length(tree$edge.length))
    for (i in 1:length(tree$tip.label)) {
        spp <- tree$tip.label[i]
        nodes <- .get.nodes(tree, spp)
        nodes <- nodes[1:(length(nodes) - 1)]
        internal.brlen <- tree$edge.length[which(tree$edge[, 
            2] %in% nodes)]
        if (length(internal.brlen) != 0) {
            internal.brlen <- internal.brlen * switch(type, equal.splits = sort(rep(0.5, 
                length(internal.brlen))^c(1:length(internal.brlen))), 
                fair.proportion = {
                  for (j in 1:length(nodes)) {
                    sons <- .node.desc(tree, nodes[j])
                    n.descendents <- length(sons$tips)
                    if (j == 1) portion <- n.descendents else portion <- c(n.descendents, 
                      portion)
                  }
                  1/portion
                })
        }
        ED <- sum(internal.brlen, tree$edge.length[which.edge(tree, 
            spp)])
        if (i == 1) 
            w <- ED
        else w <- c(w, ED)
    }
    return(w)
}

Evolutionary distinctiveness is our basic measure of phylogenetic isolation. #This should likely be ‘fair proportions’ instead of ‘equal.splits’.

      library(phytools)
      library(FARM)
      # Calculate ED
      # Using equal.splits method, faster computation
     # Evolutionary_distinctiveness_i <- evol.distinct2(this_tree, type = "equal.splits")  
      
      # ED - Summed evolutionary distinctiveness
     # Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness_i)
#Evolutionary_distinctiveness_sum

We can run some standard summary statistics (mean and variance) on this ED measure. var(Ed) shows up close to VPD on the PCAs in the intro. Tucker(2016)

      # mean(ED)
     # mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness_i)
      
      # var(ED)
      #variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness_i)
#mean_Phylogenetic_isolation
#variance_Phylogenetic_isolation

Mean evolutionary distinctiveness (divergence indices)

The divergence indices version for \(ED\) is mean evolutionary distinctiveness, \(MED\). The mean of evolutionary distinctiveness. Redding (2003) and Isaac (2007) \[ MED = \dfrac {\sum_{i} ED_{i}} {S} \\ \ \\ with \\ \ \\ ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}} \] #### Entropy measure of evolutonary distinctiveness (regularity indices) The regularity indices for \(ED\)/phylogenetic isolation are \(H_{ED}\), \(E_{ED}\), \(var(ED)\), \(H_{AED}\)

\(H_{ED}\), Entropy measure of evolutionary distinctiveness, is the shannon index applied to evolutionary distinctiveness values. Cadotte et al. (2010) \[ H_{ED} = -\sum_{i=1}^{S} ((\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}}) * \ln (\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}})) \]

\(E_{ED}\), Equitability of evolutionary distinctiveness, is \(H_{ED}\) controlled for species richness. Cadotte et al. (2010)

\[ E_{ED} = \dfrac{H_{ED}}{\ln(S)} \] \(var(ED)\), Variance in evolutinoary distinctiveness, is the variance of species evolutionary distinctiveness. Tucker (2016)

\[ var(ED) = \dfrac{1}{S-1} * \sum_{i=1}^{S} (ED_{i}-\dfrac{\sum_{i=1}^{S} ED_{i}}{S})^2 \] \(H_{ED_{Ab}}\), Abundance-weighted version of \(H_{ED}\). Cadotte et al. (2016)

\[ H_{ED_{Ab}} = -\sum_{i=1}^{S} (\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}} * \ln(\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}})) \]

Beta diversity

Beta-diversity indices I. Richness indices (presence–absence data) II. Divergence indices (using pairwise distances among species) 1. Presence/absence data A. Decomposition into , , diversities B. Direct dissimilarities 1. Using all distances 2. Using nearest distances 2. Abundance data A. Decomposition into , , diversities B. Direct dissimilarities III. Parametric indices 1. Equivalent numbers 2. Entropy

Tree topology

Tree topology is a measure of the shape of the overall tree. The tree can be lopsided side-to-side or front-to-back.

Our most trusted index for the tippy vs trunky of a tree is the gamma index, \(\gamma\).The index characterizes the distribution of branching events within the tree. Trees with γ < 0 have relatively longer branches towards the tips of the phylogeny (tippy trees), whereas trees with γ > 0 have relatively longer inter-nodal distances towards the root of the phylogeny (stemmy trees). tk represents an ‘evolutionary period’ (limits are given by two speciation events) or equivalently an internode distance. Pybus and Harvey (2000)

\[ \gamma = \dfrac {(\dfrac{1}{S-2}* \sum_{i=2}^{S-1} (\sum_{k=2}^{i} Kt_{k}))- \dfrac{1}{2} * \sum_{j=2}^{S} jt_{j}} {(\sum_{j=2}^{S} jt_{j}) * \sqrt{\dfrac{1}{12*(S-2)}}} \]

      library(phytools)
      
      ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
ltts
str(ltts)
      lineages_through_time <- as.numeric(ltts[[1]])
      time_steps <- as.numeric(ltts[[2]])
      #extract Gamma index
      gamma <- ltts[[3]]
      gamma_p_value <- ltts[[4]]
lineages_through_time 
time_steps 
gamma 
gamma_p_value 

There are two other regularly used metrics that include abundance measures. Note: we don’t have abundance measures for D-place data.

\(IAC\), imbalance of abundance at the clade level, quantifies the relative deviation in the abundance distribution from a null case where individuals are evenly partitioned between clade splits. \(v\) is the number of nodes in the phylogenetic tree. \(n_{i}\) is, as defined above, the abundance of species \(i\) in the assemblage. \(\eta_{k}\) is the expected abundance species \(i\) would have if the abundance was randomly split among lineages in the phylogenetic tree at each speciation event. is the number of lineages originating at node \(k\) in the set \(s(k,root)\), which contains the nodes located on the path between node \(k\) and the root of the phylogenetic tree. N is the total assemblage abundance. Cadotte et al. (2010)

\[ \dfrac{\sum_{i=1}^{S} |n_{i} - \hat{n_{i}}|} {v} \\ \ \\ where \\ \ \\ \hat{n_{i}} = \dfrac{N}{\prod_{K \in s(i, root)}\eta_{k}} \]

\(I_{c}\), the Colless index, is the sum of the absolute differences in species richness between sister-clades at each internal node. For fully resolved trees, each internal node defines two sister-clades. \(S_{1k}\) is the number of species descending from the first clade defined by node k and \(S_{2k}\) that of the second clade. \(v\) is, as defined above, the number of nodes in the phylogenetic. Colless (1982)

\[ I_{c} = \sum_{k=1}^{v} |S_{1k} - S_{2k}| \]

Macroevolutionary rates


#function name = bd, function input = tree of type 'phylo'
print(bd) 
 ## Speciation vs extinction rates and Net diversification
     

      bds <- bd(this_tree)
      speciation_rate <- bds[1]
      extinction_rate <- bds[2]
      extinction_per_speciation <- bds[3]
      speciation_minus_extinction <- bds[4]
  

      ## Speciation vs extinction rates and Net diversification dependent on trait
     # N.for.dom <- table(this_world[, 6])
  #    if(length(N.for.dom) == 2) {
        par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
        trait_1_speciation <- par.div.dep[1]
        trait_2_speciation <- par.div.dep[2]
        trait_1_extinction <- par.div.dep[3]
        trait_2_extinction <- par.div.dep[4]
        transition_from_trait_1_to_2 <- par.div.dep[5]
        transition_from_trait_2_to_1 <- par.div.dep[6]
        transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
      
library(ROCR)
        ## Crown age per trait AUC and effect size
        tip.length <- this_tree$edge.length[this_tree$edge[, 2] %in% 1:Ntip(this_tree)]
        tip.length <- (tip.length - min(tip.length)) / (max(tip.length) - min(tip.length))
        this_trait <- this_world[match(this_tree$tip.label, this_world[, 8]), 6]
        tip.length.2 <- tip.length[this_trait == 2]
        tip.length.1 <- tip.length[this_trait == 1]
        model <- glm(as.factor(this_trait) ~ log(tip.length + 1),
                     family = "binomial")
        effect.size <- model$coefficients[2]
       # plot(y = this_trait - 1, x= log(tip.length))
        p <- predict(model, as.factor(this_trait), type = "resp")
       # points(y = p, x = log(tip.length), col = "red")
        pr <- prediction(p, as.factor(this_trait))
        auc.model <- performance(pr, measure = "auc")@y.values[[1]]

      
  ## Phylogenetic signal (D)
        Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)
        

Spatial Locations


library(spdep)
 ## Spatial Analysis
        nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
        nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
        nbs.listw <- nb2listw(nbs)
        factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
        spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
        spatial.tests.fora <- spatial.tests[[1]]$statistic
        spatial.tests.dom <- spatial.tests[[2]]$statistic
        #prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
results_summary_matrix_1 <- cbind(

        number_of_branches,
        #Pylo_diversity_is_sum_of_BL,
        #average_phylogenetic_diversity_is_mean_of_BL,
        #variance_Pylo_diversity_is_variance_of_BL,

        F_quadratic_entropy_is_sum_of_PD,
        Mean_pairwise_distance,
        variance_pairwise_distance,

        #Evolutionary_distinctiveness_sum,
        #mean_Phylogenetic_isolation,
        #variance_Phylogenetic_isolation,

        gamma,
        gamma_p_value,
        speciation_rate,
        extinction_rate,
        extinction_per_speciation,
        speciation_minus_extinction,
        trait_1_speciation,
        trait_2_speciation ,
        trait_1_extinction ,
        trait_2_extinction ,
        transition_from_trait_1_to_2 ,
        transition_from_trait_2_to_1 ,
        transition_rate_ratio_1to2_over_2to1 ,
        Phylogenetic_signal,
        spatial.tests.fora,
        spatial.tests.dom,
       # prevalence,
       # auc.model,
        effect.size
      )
      #rownames(results_summary_matrix_1) <- 1

      #results_summary_matrix_2 <- cbind(
      #  c(Evolutionary_distinctiveness,NA),
      #  lineages_through_time,
      #  time_steps
      #)
      #colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness", "lineages_through_time", "time_steps")
      #head(results_summary_matrix_2)

      ### Returns from function in list form
      #returns <- list(
        #Branch_Lengths,
        #Pairwise_dist,
      #  results_summary_matrix_1,
      #  results_summary_matrix_2

      #)

      #names(returns) <- c(
        #"Branch_Lengths",
        #"Pairwise_distance",
       # "results_summary_of_single_value_outputs",
       # "results_summary_matrix_of_multi_value_outputs"
      #)
      
      

Module2() returns these two matrices as a list

Running an R script on the cluster requires two parts: an R script with the code to be run and a PBS script to control how that R script is run on the cluster.

There are two different clusters at Wustl.edu, an old and a new cluster. This script runs the R code on the new cluster.

#!/bin/bash
#PBS -N Four_model_run 
#PBS -V                     
#PBS -l walltime=23:59:00               
#PBS -l pmem=1200mb 
#PBS -l nodes=1:ppn=1:haswell 
#PBS -t 1-1000


echo $PBS_ARRAYID

cd /home/cbotero/mydirectory/Four_model_compare
module load R

export R_LIBS=$HOME/rlibs
#R CMD INSTALL --library=/home/ttuff/rlibs  FARM_1.0.tar.gz

Rscript --vanilla ./FARM_four_model_compare.R ${PBS_ARRAYID}

We were regularly causing problems running to many jobs on the new cluster and we were asked to move to the old cluster. This cluster has slower individual processors, but we can run more jobs at one time, so productivity has stayed about the same.

#!/bin/bash
#PBS -N FARM_third_run_old 
#PBS -V                     
#PBS -l walltime=160:00:00              
#PBS -l pmem=1200mb 
#PBS -q old
#PBS -l nodes=1:ppn=1:nehalem 
#PBS -t 1-500


echo $PBS_ARRAYID

cd /home/ttuff/mydirectory/Four_model_compare
module load R

export R_LIBS=$HOME/rlibs
#R CMD INSTALL --library=/home/ttuff/rlibs  FARM_1.0.tar.gz

Rscript --vanilla ./FARM_four_model_compare.R ${PBS_ARRAYID} 

The final argument in the #PBS script above (#PBS -t 1-500) controls the serial running schema for running many simultanious instances of the R script at a time. This argument is passes to R as an integer value using the following arguements inside R.


args <- commandArgs(trailingOnly = FALSE) #7 elements are passed from the PBS

NAI <- as.numeric(args[7]) # the seventh of those elements is the array integer.

Here is a working example of how to set up the R script.

#install.packages("rfoaas")
library(rfoaas)

##If the PBS script started this code running and passed the number 13
##to this particular run of a larger serial set. 


#args <- commandArgs(trailingOnly = FALSE) 

NAI <- 13 #as.numeric(args[7])

    sayHello <- function(loop_number){
      print(paste0("I can count to ", loop_number, "!   ", cool(from="Ty")))
    }

    sayHello(NAI)

You logon to the cluster using linux/unix code from the command line terminal on you computer. Open the terminal and put in you login info.

ssh -Y ttuff@login.chpc.wustl.edu
password:_______

Upon first login, you will be in a folder called ‘HOME’ with a series of system files in it. You will want to use an FTP client to view and organize these files. I prefer Filezilla, but there are several other good clients available for free. Download filezilla, make sure it’s in your applications folder, and open it. You should see a window that looks like a newer version of this. The left panels are the files on the computer you’re working from and the right two panels will show the files on the server once you log in through Filezilla also.

A fresh Filezilla window

A fresh Filezilla window

Start a new server link

Start a new server link

Start a new site and name that new site

Start a new site and name that new site

Select a secure ssh file transfer protocol

Select a secure ssh file transfer protocol

Login as normal

Login as normal

Enter password

Enter password

You need to create two new directories (folders) for us to work out of. Call one rlibs and the other mydirectory

You need to create two new directories (folders) for us to work out of. Call one ‘rlibs’ and the other ‘mydirectory’

Within mydirectory, create another folder called Four_model_compare

Within mydirectory, create another folder called ‘Four_model_compare’

Within Four_model_compare, create two folders called Module_1_outputs and Module_2_outputs. Then drag three files from your computer files on the left to the server folders on the right: one .R file, one .pbs file, and a zip file with the package FARM in it. These should be named FARM_four_model_compare.R, FARM_four_model_compare_old_cluster.pbs, and FARM_1.0.tar.gz .

Within ‘Four_model_compare’, create two folders called ‘Module_1_outputs’ and ‘Module_2_outputs’. Then drag three files from your computer files on the left to the server folders on the right: one .R file, one .pbs file, and a zip file with the package FARM in it. These should be named FARM_four_model_compare.R, FARM_four_model_compare_old_cluster.pbs, and FARM_1.0.tar.gz .

Once logged in, you need to change the directory

cd /home/ttuff/mydirectory/Four_model_compare

Barker, Gary M. 2002. “Phylogenetic diversity : a quantitative framework for measurement of priority and achievement in biodiversity conservation.” Biological Journal of the Linnean Society 76: 165–94. http://www.jstor.org/stable/3070944?seq=1{\#}page{\_}scan{\_}tab{\_}contents.

Cadotte, Marc W., T. Jonathan Davies, James Regetz, Steven W Kembel, Elsa Cleland, and Todd H. Oakley. 2010. “Phylogenetic diversity metrics for ecological communities : integrating species richness, abundance and evolutionary history.” Ecology Letters 13: 96–105. doi:10.1111/j.1461-0248.2009.01405.x.

Clarke, KR, and RM Warwick. 1998. “Quantifying structural redundancy in ecological communities.” Oecologia 113: 278–89. http://link.springer.com/article/10.1007/s004420050379.

———. 2001. “A further biodiversity index applicable to species lists: variation in taxonomic distinctness.” Marine Ecology Progress Series 216: 265–78. http://researchrepository.murdoch.edu.au/id/eprint/23107/.

Colless, Donald H. 1982. “Review of phylogenetics: the theory and practice of phylogenetic systematics.” Systematic Zoology 31 (1): 100–104. http://www.jstor.org/stable/2413420.

Dehling, D Matthias, Susanne A Fritz, Till Töpfer, Martin Päckert, Patrizia Estler, Katrin Böhning-gaese, and Matthias Schleuning. 2014. “Functional and phylogenetic diversity and assemblage structure of frugivorous birds along an elevational gradient in the tropical Andes IBS special issue.” Ecography 37: 1047–55. doi:10.1111/ecog.00623.

Faith, Daniel P. 1992. “Conservation evaluation and phylogenetic diversity.” Biological Conservation 61: 1–10. http://www.sciencedirect.com/science/article/pii/0006320792912013.

Hardy, Olivier J, and Bruno Senterre. 2007. “Characterizing the phylogenetic structure of communities by an additive partitioning of phylogenetic diversity.” Journal of Ecology 95: 493–506. doi:10.1111/j.1365-2745.2007.01222.x.

Helmus, Matthew R, Wendel (Bill) Keller, Michael J Paterson, Norman D Yan, Charles H Cannon, and James A Rusak. 2010. “Communities contain closely related species during ecosystem disturbance.” Ecology Letters 13: 162–74. doi:10.1111/j.1461-0248.2009.01411.x.

Isaac, Nick J B, Samuel T Turvey, Ben Collen, Carly Waterman, and Jonathan E M Baillie. 2007. “Mammals on the EDGE : Conservation Priorities Based on Threat and Phylogeny.” PLoS ONE 2 (3): e296. doi:10.1371/journal.pone.0000296.

Kembel, Steven W, Peter D Cowan, Matthew R Helmus, William K Cornwell, Helene Morlon, David D Ackerly, Simon P Blomberg, and Campbell O Webb. 2010. “Picante : R tools for integrating phylogenies and ecology.” Bioinformatics 26 (11): 1463–4. doi:10.1093/bioinformatics/btq166.

Nielsen, Rasmus, Joshua M Akey, Mattias Jakobsson, Jonathan K Pritchard, Sarah Tishkoff, and Eske Willerslev. 2017. “Tracing the peopling of the world through genomics.” Nature 541: 302–10. doi:10.1038/nature21347.

Pavoine, S Ã, S Ollier, and D Pontier. 2005. “Measuring diversity from dissimilarities with Rao’s quadratic entropy: Are any dissimilarities suitable?” Theoretical Population Biology 67 (4): 231–39. doi:10.1016/j.tpb.2005.01.004.

Pybus, Oliver G, and Paul H Harvey. 2000. “Testing macro-evolutionary models using incomplete molecular phylogenies.” Proceedings of the Royal Society B 267 (1459): 2267–72. doi:10.1098/rspb.2000.1278.

Rao, Radhakrishna C. 1982. “Diversity and Dissimilarity coefficients: a unified approach.” Theoretical Population Biology 21: 24–43. http://www.sciencedirect.com/science/article/pii/0040580982900041.

Rosauer, Dan F, Shawn W Laffan, Michael D Crisp, and Stephen C Donnellan. 2009. “Phylogenetic endemism : a new approach for identifying geographical concentrations of evolutionary history.” Molecular Ecology 18: 4061–72. doi:10.1111/j.1365-294X.2009.04311.x.

Safi, Kamran, Katrina Armour-marshall, Jonathan E M Baillie, and Nick J B Isaac. 2013. “Global Patterns of Evolutionary Distinct and Globally Endangered Amphibians and Mammals.” PLoS ONE 8 (5): e63582–e63582. doi:10.1371/journal.pone.0063582.

Tucker, Caroline M., Marc W. Cadotte, Silvia B. Carvalho, T. Jonathan Davies, Simon Ferrier, Susanne A. Fritz, Rich Grenyer, et al. 2016. “A guide to phylogenetic metrics for conservation, community ecology and macroecology.” Biological Reviews. doi:10.1111/brv.12252.

Vellend, Mark, William K Cornwell, Karen Magnuson-ford, and Arne Ø Mooers. 2011. “Measuring phylogenetic biodiversity.” In Biological Diversity: Frontiers in Measurement and Assessment, 194–207. Oxford, UK: Oxford University Press. http://phylodiversity.net/wcornwell/Vellend{\_}etal{\_}2011{\_}bookchap.pdf.

Villeger, Sebastien, Norman WH Mason, and David Mouillot. 2008. “New multidimensional functional diversity indices for a multifaceted framework in functional ecology.” Ecology 89 (8): 2290–2301. http://onlinelibrary.wiley.com/doi/10.1890/07-1206.1/full.

Webb, Campbell O, David D Ackerly, and Steven W Kembel. 2008. “Phylocom: software for the analysis of phylogenetic community structure and trait evolution.” Bioinformatics 24 (18): 2098–2100. doi:10.1093/bioinformatics/btn358.

Webb, Campbell O, David D Ackerly, Mark A Mcpeek, and Michael J Donoghue. 2002. “Phylogenies and Community Ecology.” Annual Review of Ecology and Systematics 33: 475–505. doi:10.1146/annurev.ecolsys.33.010802.150448.

---
title: 'D-place FARM documentation: Master script'
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
output:
  pdf_document: default
  html_notebook: default
  html_document: default
  word_document: default
bibliography: FARM package.bib
---

There are two versions of this script, the first is for running each simulation to the 
end and then saving the final step as the output of the model and the second 
is to save outputs along the way so we can evaluate how different models change through time. 

Here is the first, and primary, version:

```{r}


#####################################################################

# Run the full model in a cluster. This version writes files to a cluster output folder.
# rm(list = ls())
# install.packages("~/Desktop/FARM_1.0.tar.gz", repos=NULL, type="source")


#####################################################################
## need to document which functions we use from each of these libraries. 
library(ape)
library(spdep)
library(Rcpp)
library(msm)
library(FARM)


sim_run_cluster <- function(replicate_cycle, myWorld, number_of_time_steps, nbs,
                            number_of_tips, number_of_neighbors, origins, start = NULL) {
  # Calls the full simulation script 
  #	 
  # Purpose: Need to wrap the entire simulation script into a function so it can be called in parallel from a cluster call 	
  #
  # Args:
  #    replicate_cycle: An integer indicating the replicate number of a simulation. This variable is used in this function to label        
  #			the saved output file and control the number of replicates run by the cluster.
  #
  #    combo_number: An interger between 1 and 31 indicating the combinations of S, E, A, D, and T modules to be included 
  #			in the simulation. The full list of these combinations can be printed using the function combo_of_choice(28, TRUE).
  # 		We are currently using combinations 25,28,29,and 31 as our four competing models for the spread of agriculture.  
  #
  #    myWorld: Matrix that defines the scope of the available world and acts as a data hub for organizing and reporting 	  
  #			results from the different elements of the simulation. 
  #
  #    number_of_time_steps: An integer indicating how many iterations the simulation will calculated before writing the data 
  #			file. 
  #
  #    nbs: A list of the available neighbors for each spatial point. This is passed to the function for calculating the interaction 
  #			of neighbors through time. 
  #
  #    number_of_tips: An interger indicating the number of tree tips the simulation should be truncated to. The default is to 
  #			include all the available tips (e.g. 1254 for human languages). 
  #
  # Returns: 
  #    myOut: A list object containing a 'phylo' tree object called mytree in the first position and the myWorld matrix of 
  #      	spatial and tree data in the second position 
  #		
  

  x1 <- 4 #Number of runs per core
  sampleer <- sample(c(1,2,5,6), x1)
  #if (replicate_cycle != 1) {
  #  replicate_cycle <- ((replicate_cycle - 1) * x1) + 1
 # }
 # replicate_cycle <- replicate_cycle:(replicate_cycle + (x1 - 1))
  for (count in sampleer) {
  independent <- 1 

    
    # Probability of Arisal
    prob_choose_a <- rev(sort(rexp(4, rate = 9)))
    prob_choose_a <- prob_choose_a[c(sample(1:2, 2), sample(3:4, 2))]
    prob_choose_a[3] <- 0
    P.Arisal0  <- parameters(prob_choose_a[1], prob_choose_a[4],
                             prob_choose_a[3], prob_choose_a[2],
                             "Env_NonD", "Env_D",
                             "Evol_to_F", "Evol_to_D")
    # P.Arisal0 is the one you should change the parameters
    P.Arisal <- matrix(NA, ncol = 2, nrow = nrow(myWorld)) # probability per cell
    colnames(P.Arisal) <- c("Evolve_to_F", "Evolve_to_D")
    Env.Dom <- myWorld[, 7] == 2
    P.Arisal[Env.Dom, 1] <- P.Arisal0[1, 2]
    P.Arisal[!Env.Dom, 1] <- P.Arisal0[1, 1]
    P.Arisal[Env.Dom, 2] <- P.Arisal0[2, 2]
    P.Arisal[!Env.Dom, 2] <- P.Arisal0[2, 1]
    
    colnames(P.Arisal) <- c("Prob_of_Foraging", "Prob_of_Domestication")
    P.Arisal[which(origins == FALSE), 2]  <- 0
    
    #####
    #prob_choose <- runif(12, 0.01, 1)
    #sub <- (prob_choose[1] - 0.01)
    #sub <- ifelse(sub < .1, .1, sub)
    #prob_choose[c(4)] <- runif(1, 0.01, sub)
    #prob_choose[c(5)] <- runif(1, 0.1, 1) # High extinction
    #prob_choose[c(6)] <- runif(1, 0, (prob_choose[3] - 0.01))
    #prob_choose[c(9, 10, 12)] <- runif(3, 0.01, prob_choose[11])
    
    ####
    prob_choose <- runif(12, 0, 1)
    top <- min(prob_choose[c(1,3)], na.rm=TRUE)
    prob_choose[c(2)] <- runif(1, 0, top)
    
    prob_choose[c(5)] <- runif(1, 0, prob_choose[c(2)]) 
    prob_choose[c(6)] <- runif(1, 0, prob_choose[c(5)])
    prob_choose[c(4)] <- runif(1, prob_choose[c(6)], prob_choose[c(5)])
    
        
    if (count == 1) {
      prob_choose[7:12] <- 0
    }
    if (count == 2) {
      prob_choose[9:12] <- 0
    }
    if (count == 3 | count == 5) {
      prob_choose[7:8] <- 0
      independent <- 0
    }
    if (count == 4 | count == 6) {
      independent <- 0
    }
    
    
    P.speciation <- parameters(prob_choose[1], prob_choose[1],
                               prob_choose[2], prob_choose[3],
                               "Env_NonD", "Env_D", "For", "Dom")

    P.extinction  <- parameters(prob_choose[4], prob_choose[4],
                                prob_choose[5], prob_choose[6],
                                "Env_NonD", "Env_D", "For", "Dom")

    
    P.diffusion <- parameters(0, prob_choose[7],
                              prob_choose[8], 0,
                              "Target_For", "Target_Dom",
                              "Source_For", "Source_Dom")
    
    P.TakeOver <- parameters(prob_choose[9], prob_choose[10],
                             prob_choose[11], prob_choose[12],
                             "Target_For", "Target_Dom",
                             "Source_For", "Source_Dom")
    multiplier <- 1 # always 1 now.
   if (count %in% 1:4) { 
    	myOut <- RunSimUltimate(myWorld, P.extinction, P.speciation, 
                            P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                            N.steps = number_of_time_steps, silent = FALSE, 
                            multiplier = multiplier, start = start)
                            }
   if (count %in% 5:6) {
     	myOut <- RunSimUltimate.push(myWorld, P.extinction, P.speciation, 
                            P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                            N.steps = number_of_time_steps, silent = FALSE, 
                            multiplier = multiplier, start = start)
                            }
                            
                            
    # Count refers to the combo, 1 = null, 2 = diffusion, 3 = Takeover, 4 = full
    save(myOut,  file= paste0("./Module_1_outputs/myOut_rep_",
                              formatC(replicate_cycle, width = 2,flag = 0),
                              "_combo_",
                              formatC(count, width = 2,flag = 0),
                              "_","params", "_P.speciation_",
                              paste(formatC(P.speciation, width = 2,flag = 0), collapse="_"),"_P.extinct_",
                              paste(formatC(P.extinction, width = 2,flag = 0), collapse="_"), "_P.diffus_",
                              paste(formatC(P.diffusion, width = 2,flag = 0), collapse="_"), "_P.TO_",
                              paste(formatC(P.TakeOver, width = 2,flag = 0), collapse="_"),"_P.Arisal_",
                              paste(formatC(P.Arisal0, width = 2,flag = 0), collapse="_"),
                              "_timesteps_", number_of_time_steps, "_NBS_", number_of_neighbors, "_.Rdata"))

    Sim_statistics <- Module_2(myOut)

    save(Sim_statistics, file= paste0("./Module_2_outputs/Sim_stats_rep_",
                                      formatC(replicate_cycle, width = 2,flag = 0),
                                      "_combo_",
                                      formatC(count, width = 2,flag = 0),
                                      "_","params", "_P.speciation_",
                                      paste(formatC(P.speciation, width = 2,flag = 0), collapse="_"),"_P.extinct_",
                                      paste(formatC(P.extinction, width = 2,flag = 0), collapse="_"), "_P.diffus_",
                                      paste(formatC(P.diffusion, width = 2,flag = 0), collapse="_"), "_P.TO_",
                                      paste(formatC(P.TakeOver, width = 2,flag = 0), collapse="_"),"_P.Arisal_",
                                      paste(formatC(P.Arisal0, width = 2,flag = 0), collapse="_"),
                                      "_timesteps_", number_of_time_steps, "_NBS_", number_of_neighbors, "_.Rdata"))
  }
  
}

#####################################################################
coords <- as.matrix(apply(language_centroids[, 3:4], 2, as.numeric)) #coords
conds <- ifelse(suitability2 == 0, 1, 2)
conds[is.na(conds)] <- sample(c(1, 2), sum(is.na(conds)), replace = TRUE) 
origins <- language_centroids[, 5]

##### Specify simulation parameters #################################

number_of_tips <- length(coords[,1])
number_of_time_steps_a <- 30000
#replicate_cycle <- c(1)  #number of replicates
#####################################################################
data("parameters.table")


system.time(
  myWorld <- BuildWorld(coords, conds)
)
number_of_neighbors <- sample(5:9,1)

nbs <- knn2nb(knearneigh(coords, k = number_of_neighbors, longlat = TRUE),
              sym = TRUE) # 7 symmetric neighbors
n.obs <- sapply(nbs, length)
seq.max <- seq_len(max(n.obs))
nbs <- t(sapply(nbs, "[", i = seq.max))

dim(myWorld)


# NAI <- 1000
args <- commandArgs(trailingOnly = FALSE)
print(args)
NAI <- as.numeric(args[7])
#setwd("~/Box Sync/colliding ranges/Simulations_humans/big world cluster outputs")


# Starting point
start <- sample((1:nrow(language_centroids))[as.logical(language_centroids[, 6])], 1)

#sim_run_cluster(replicate_cycle = NAI,
#                myWorld, number_of_time_steps = number_of_time_steps_a, 
#                nbs, number_of_tips = nrow(myWorld), number_of_neighbors= number_of_neighbors, #origins=origins,start = start)



```


Here is the second version

```{r}


#####################################################################

# Run the full model in a cluster. This version writes files to a cluster output folder.
# rm(list = ls())
# install.packages("~/Desktop/FARM_1.0.tar.gz", repos=NULL, type="source")


#####################################################################
## need to document which functions we use from each of these libraries. 
library(ape)
library(spdep)
library(Rcpp)
library(msm)
library(FARM)


sim_run_cluster <- function(replicate_cycle, myWorld, number_of_time_steps, nbs,
                            number_of_tips, number_of_neighbors, origins, start = NULL) {
  # Calls the full simulation script 
  #	 
  # Purpose: Need to wrap the entire simulation script into a function so it can be called in parallel from a cluster call 	
  #
  # Args:
  #    replicate_cycle: An integer indicating the replicate number of a simulation. This variable is used in this function to label        
  #			the saved output file and control the number of replicates run by the cluster.
  #
  #    combo_number: An interger between 1 and 31 indicating the combinations of S, E, A, D, and T modules to be included 
  #			in the simulation. The full list of these combinations can be printed using the function combo_of_choice(28, TRUE).
  # 		We are currently using combinations 25,28,29,and 31 as our four competing models for the spread of agriculture.  
  #
  #    myWorld: Matrix that defines the scope of the available world and acts as a data hub for organizing and reporting 	  
  #			results from the different elements of the simulation. 
  #
  #    number_of_time_steps: An integer indicating how many iterations the simulation will calculated before writing the data 
  #			file. 
  #
  #    nbs: A list of the available neighbors for each spatial point. This is passed to the function for calculating the interaction 
  #			of neighbors through time. 
  #
  #    number_of_tips: An interger indicating the number of tree tips the simulation should be truncated to. The default is to 
  #			include all the available tips (e.g. 1254 for human languages). 
  #
  # Returns: 
  #    myOut: A list object containing a 'phylo' tree object called mytree in the first position and the myWorld matrix of 
  #      	spatial and tree data in the second position 
  #		
  

  x1 <- 4 #Number of runs per core
  sampleer <- sample(c(1,2,5,6), x1)
  #if (replicate_cycle != 1) {
  #  replicate_cycle <- ((replicate_cycle - 1) * x1) + 1
 # }
 # replicate_cycle <- replicate_cycle:(replicate_cycle + (x1 - 1))
  for (count in sampleer) {
  independent <- 1 

    
    # Probability of Arisal
    prob_choose_a <- rev(sort(rexp(4, rate = 9)))
    prob_choose_a <- prob_choose_a[c(sample(1:2, 2), sample(3:4, 2))]
    prob_choose_a[3] <- 0
    P.Arisal0  <- parameters(prob_choose_a[1], prob_choose_a[4],
                             prob_choose_a[3], prob_choose_a[2],
                             "Env_NonD", "Env_D",
                             "Evol_to_F", "Evol_to_D")
    # P.Arisal0 is the one you should change the parameters
    P.Arisal <- matrix(NA, ncol = 2, nrow = nrow(myWorld)) # probability per cell
    colnames(P.Arisal) <- c("Evolve_to_F", "Evolve_to_D")
    Env.Dom <- myWorld[, 7] == 2
    P.Arisal[Env.Dom, 1] <- P.Arisal0[1, 2]
    P.Arisal[!Env.Dom, 1] <- P.Arisal0[1, 1]
    P.Arisal[Env.Dom, 2] <- P.Arisal0[2, 2]
    P.Arisal[!Env.Dom, 2] <- P.Arisal0[2, 1]
    
    colnames(P.Arisal) <- c("Prob_of_Foraging", "Prob_of_Domestication")
    P.Arisal[which(origins == FALSE), 2]  <- 0
    
    #####
    #prob_choose <- runif(12, 0.01, 1)
    #sub <- (prob_choose[1] - 0.01)
    #sub <- ifelse(sub < .1, .1, sub)
    #prob_choose[c(4)] <- runif(1, 0.01, sub)
    #prob_choose[c(5)] <- runif(1, 0.1, 1) # High extinction
    #prob_choose[c(6)] <- runif(1, 0, (prob_choose[3] - 0.01))
    #prob_choose[c(9, 10, 12)] <- runif(3, 0.01, prob_choose[11])
    
    ####
    prob_choose <- runif(12, 0, 1)
    top <- min(prob_choose[c(1,3)], na.rm=TRUE)
    prob_choose[c(2)] <- runif(1, 0, top)
    
    prob_choose[c(5)] <- runif(1, 0, prob_choose[c(2)]) 
    prob_choose[c(6)] <- runif(1, 0, prob_choose[c(5)])
    prob_choose[c(4)] <- runif(1, prob_choose[c(6)], prob_choose[c(5)])
    
        
    if (count == 1) {
      prob_choose[7:12] <- 0
    }
    if (count == 2) {
      prob_choose[9:12] <- 0
    }
    if (count == 3 | count == 5) {
      prob_choose[7:8] <- 0
      independent <- 0
    }
    if (count == 4 | count == 6) {
      independent <- 0
    }
    
    
    P.speciation <- parameters(prob_choose[1], prob_choose[1],
                               prob_choose[2], prob_choose[3],
                               "Env_NonD", "Env_D", "For", "Dom")

    P.extinction  <- parameters(prob_choose[4], prob_choose[4],
                                prob_choose[5], prob_choose[6],
                                "Env_NonD", "Env_D", "For", "Dom")

    
    P.diffusion <- parameters(0, prob_choose[7],
                              prob_choose[8], 0,
                              "Target_For", "Target_Dom",
                              "Source_For", "Source_Dom")
    
    P.TakeOver <- parameters(prob_choose[9], prob_choose[10],
                             prob_choose[11], prob_choose[12],
                             "Target_For", "Target_Dom",
                             "Source_For", "Source_Dom")
    multiplier <- 1 # always 1 now.
   if (count %in% 1:4) { 
    	myOut <- RunSimUltimate2(myWorld, P.extinction, P.speciation, P.diffusion, P.Arisal, 
    P.TakeOver, nbs, independent, number_of_time_steps, multiplier, silent = TRUE, 
    count, resolution = seq(1, number_of_time_steps, 100), P.Arisal0, start = NULL)
                            }
   if (count %in% 5:6) {
     	myOut <- RunSimUltimate2.push(myWorld, P.extinction, P.speciation, P.diffusion, P.Arisal, 
    P.TakeOver, nbs, independent, number_of_time_steps, multiplier, silent = TRUE, 
    count, resolution = seq(1, number_of_time_steps, 100), P.Arisal0, start = NULL)
                            }
                            
                           
  }
  
}

#####################################################################
coords <- as.matrix(apply(language_centroids[, 3:4], 2, as.numeric)) #coords
conds <- ifelse(suitability2 == 0, 1, 2)
conds[is.na(conds)] <- sample(c(1, 2), sum(is.na(conds)), replace = TRUE) 
origins <- language_centroids[, 5]

##### Specify simulation parameters #################################

number_of_tips <- length(coords[,1])
number_of_time_steps_a <- 50000
#replicate_cycle <- c(1)  #number of replicates
#####################################################################
data("parameters.table")


system.time(
  myWorld <- BuildWorld(coords, conds)
)
number_of_neighbors <- sample(5:9,1)

nbs <- knn2nb(knearneigh(coords, k = number_of_neighbors, longlat = TRUE),
              sym = TRUE) # 7 symmetric neighbors
n.obs <- sapply(nbs, length)
seq.max <- seq_len(max(n.obs))
nbs <- t(sapply(nbs, "[", i = seq.max))

dim(myWorld)


# NAI <- 1000
args <- commandArgs(trailingOnly = FALSE)
print(args)
NAI <- as.numeric(args[7])
#setwd("~/Box Sync/colliding ranges/Simulations_humans/big world cluster outputs")


# Starting point
start <- sample((1:nrow(language_centroids))[as.logical(language_centroids[, 6])], 1)

#sim_run_cluster(replicate_cycle = NAI,
#                myWorld, number_of_time_steps = number_of_time_steps_a, 
#                nbs, number_of_tips = nrow(myWorld), number_of_neighbors= number_of_neighbors, origins=origins,start = start)



```







<!--chapter:end:Call_modules_master_script_markdown.Rmd-->

---
title: 'D-place FARM documentation: Flow chart of expansion rules'
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
output:
  pdf_document: default
  html_notebook: default
  html_document: default
  word_document: default
bibliography: FARM package.bib
---


```{r}
install.packages("DiagrammeR")
library(DiagrammeR)
```




```{r}

grViz("
  digraph {
    
    node [shape = box
          fontname = Helvetica
          penwidth = 2.0]
    'Input the data hub'; 
    'Is there more than one societ in the world?'; 
    'Pick the only available society';
    'Randomly choos a spreader society';
    'Are there any empty neighboring locations?';
    'Attempt to Takover';
    'Is the spreader a Forager of Domesticator?';
    'is there neighbors with suitable enivonrment for domestication?';
    'Randomly choose a target neighbor';
    'Randomly choose a taret neighbor among the ones with suitable condition for domestication';
    'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?';
    'Spreader removers target from the space and phylogeny';
    'Nothing happens';
    'Speader occupies the target cell and bifurcate in the phylogeny';
    'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?';
    'Attempt to dispersal';
    'Randomly choos an empty target location';
    'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?';
    'output to data hub';

    edge []
    'Input the data hub' -> 'Is there more than one societ in the world?';
    'Is there more than one societ in the world?' -> 'Pick the only available society' [label = 'NO'];
    'Is there more than one societ in the world?' -> 'Randomly choos a spreader society' [label = 'YES'];
    'Pick the only available society' -> 'Are there any empty neighboring locations?';
    'Randomly choos a spreader society' -> 'Are there any empty neighboring locations?';
    'Are there any empty neighboring locations?' -> 'Attempt to dispersal' [label = 'YES'];
    'Are there any empty neighboring locations?' -> 'Attempt to Takover' [label = 'NO'];
    'Attempt to Takover' -> 'Is the spreader a Forager of Domesticator?';
    'Is the spreader a Forager of Domesticator?' -> 'is there neighbors with suitable enivonrment for domestication?' [label = 'Domesticator'];
    'Is the spreader a Forager of Domesticator?' -> 'Randomly choose a target neighbor' [label = 'Forager'];
    'is there neighbors with suitable enivonrment for domestication?' -> 'Randomly choose a target neighbor' [label = 'NO'];
    'is there neighbors with suitable enivonrment for domestication?' -> 'Randomly choose a taret neighbor among the ones with suitable condition for domestication' [label = 'YES'];
    'Randomly choose a taret neighbor among the ones with suitable condition for domestication' -> 'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?';
    'Randomly choose a target neighbor'  -> 'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?' ;
    'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?' -> 'Spreader removers target from the space and phylogeny' [label = 'YES'];
    'Randomly choos a number between 0-1. Is this number smaller than the probability of takeover for the confict category**?' -> 'Nothing happens' [label = 'NO'];
    'Nothing happens' -> 'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?';
    'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?' ->  'output to data hub' [label = 'NO'];
    'Attempt to dispersal' -> 'Randomly choos an empty target location';
    'Randomly choos an empty target location' -> 'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?';
    'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?' -> 'Speader occupies the target cell and bifurcate in the phylogeny' [label = 'YES'];
    'Randomly choos a number between 0-1. Is this number smaller than the probabilityy of speciation for the spreader societ trait in the pre=expanding location environment?' -> 'Nothing happens' [label = 'NO'];
  'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?' ->  'Is there more than one societ in the world?' [label = 'YES'];
  'Speader occupies the target cell and bifurcate in the phylogeny' -> 'Are there any more societies remaining (excluding new societies generated during this loop) that did not go through the expansion process?';
  'Spreader removers target from the space and phylogeny' -> 'Speader occupies the target cell and bifurcate in the phylogeny';
  }")






```


<!--chapter:end:Flow_charts.Rmd-->

---
title: "D-place FARM documentation: Module 1"
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
output:
  html_notebook: default
  html_document: default
  pdf_document: default
  word_document: default
bibliography: FARM package.bib
---

# Module 1: Simulation to produce a world and a tree 

```{r}
# Install the most recent version of FARM from a .zip file
install.packages(file.choose(), repos=NULL) 
```

```{r}
library(FARM)
ls("package:FARM")
```

## Inputs
 
```{r}

```


## Module 1 functions
#### The first set of RunSim functions are the default pipeline where only one output is saved at the end of the simulation. 


This first function controls error messages coming from the primary function below. 
```{r}
# Run the simulation function skiping the erros and atributing NA if it occurs
RunSimUltimate <- function(myWorld, P.extinction, P.speciation,
                           P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                           N.steps, multiplier,
                           silent = TRUE, start = NULL) {

  result <- try(RunSim(myWorld, P.extinction, P.speciation,
                       P.diffusion, P.Arisal, P.TakeOver, nbs,
                       independent, N.steps,
                       multiplier, start = start), silent = silent)
  if (class(result) == "try-error") {
    result <- NA
  }
  return(result)
}

```

This is the primary function running the simulation.
```{r}
#==================================================================
# SimulationFunctions.R
#
# Contains a function for simulation of cultural evolution in space and time
# Allows for (1) Vertical Transmission (phylogenetic inheritance); (2) Horizontal
# Transmission (cultural diffusion); (3) Ecological selection (Both speciation and
# extinction are determined by the match between the state of a binary trait and the
# environment a societuy occupies).
#
# 7 Jun 2016
# Carlos A. Botero, Bruno Vilela & Ty Tuff
# Washington University in Saint Louis
#==================================================================
RunSim <- function(myWorld, P.extinction, P.speciation,
                   P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                   N.steps, multiplier, start) {
  # myWorld = The hexagonal world created with the function BuildWorld
  # P.extinction = Probability matrix of extinction
  # P.speciation = Probability matrix of speciation
  # P.diffusion = Probability matrix of diffusion
  # P.Arisal = Probability matrix of arisal
  # P.TakeOver = Probability matrix of takeover
  # N.steps = Number of steps in the model
  # multiplier = The number that will multiply the probabilities according
  # to environmetal fitness.
  # start = the point ID in 'myWorld' that will give risen to humans.
  # (humans origin will be in one of the existing positions)

  world.size <- nrow(myWorld)
  # Initialize parameters we will use later to build the phylogeny
  rootnode <-  world.size + 1 # standard convention for root node number

  # set the seed for simulation
  if (is.null(start)) {
  start <- sample(1:world.size, 1)
  }

  myWorld[start, 4:6] <- c(0, 0, 1) # Setting root(0), time(0), ancestral(1, forager)

  mytree <- TheOriginOfSpecies(world.size, start) # Empty tree
  myT <- 0 # Time starts at zero

  # Common input and output for all the internal modules
  input <- list(P.speciation, P.Arisal, P.diffusion, P.extinction, P.TakeOver,
                myWorld, mytree, myT, multiplier, nbs, independent)

  # Functions order to be randomized
  rand_order_func_run <- list("Extinction", "Diffusion", "SpeciationTakeOver", "Arisal")

  cat("0% [") # Time count

  for (steps in 1:N.steps) { # Starts the loop with 'n' steps

    if (steps %% round((N.steps / 10)) == 0) { # Time count
      cat('-') # Time count
    }# Time count
    if (steps == N.steps) { # Time count
      cat("] 100 %\n")# Time count
    }# Time count

    # Randomize functions order
    rand_order <- sample(rand_order_func_run)
    # Run the functions
    input <- do.call(rand_order[[1]], list(input = input))
    input <- do.call(rand_order[[2]], list(input = input))
    input <- do.call(rand_order[[3]], list(input = input))
    input <- do.call(rand_order[[4]], list(input = input))

  }
  # Trunsform the input/output into the final result and return it
  myWorld <- as.data.frame(input[[6]])
  myWorld[, 8] <- paste0("t", myWorld[, 8])
  mytree <- makePhy(input[[7]])
  mytree$edge.length <- mytree$edge.length / N.steps
  return(list('mytree' = mytree, 'myWorld' = myWorld))
}

```






#Push versions 

```{r}
# Run the simulation function skiping the erros and atributing NA if it occurs
RunSimUltimate.push <- function(myWorld, P.extinction, P.speciation,
                                P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                                N.steps, multiplier,
                                silent = TRUE, start = NULL) {

  result <- try(RunSim.push(myWorld, P.extinction, P.speciation,
                            P.diffusion, P.Arisal, P.TakeOver, nbs,
                            independent, N.steps,
                            multiplier, start = start), silent = silent)
  if (class(result) == "try-error") {
    result <- NA
  }
  return(result)
}


```





```{r}
#==================================================================
# SimulationFunctions.R
#
# Contains a function for simulation of cultural evolution in space and time
# Allows for (1) Vertical Transmission (phylogenetic inheritance); (2) Horizontal
# Transmission (cultural diffusion); (3) Ecological selection (Both speciation and
# extinction are determined by the match between the state of a binary trait and the
# environment a societuy occupies).
#
# 7 Jun 2016
# Carlos A. Botero, Bruno Vilela & Ty Tuff
# Washington University in Saint Louis
#==================================================================
RunSim.push <- function(myWorld, P.extinction, P.speciation,
                   P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                   N.steps, multiplier, start) {
  # myWorld = The hexagonal world created with the function BuildWorld
  # P.extinction = Probability matrix of extinction
  # P.speciation = Probability matrix of speciation
  # P.diffusion = Probability matrix of diffusion
  # P.Arisal = Probability matrix of arisal
  # P.TakeOver = Probability matrix of takeover
  # N.steps = Number of steps in the model
  # multiplier = The number that will multiply the probabilities according
  # to environmetal fitness.
  # start = the point ID in 'myWorld' that will give risen to humans.
  # (humans origin will be in one of the existing positions)

  world.size <- nrow(myWorld)
  # Initialize parameters we will use later to build the phylogeny
  rootnode <-  world.size + 1 # standard convention for root node number

  # set the seed for simulation
  if (is.null(start)) {
    start <- sample(1:world.size, 1)
  }

  myWorld[start, 4:6] <- c(0, 0, 1) # Setting root(0), time(0), ancestral(1, forager)

  mytree <- TheOriginOfSpecies(world.size, start) # Empty tree
  myT <- 0 # Time starts at zero

  # Common input and output for all the internal modules
  input <- list(P.speciation, P.Arisal, P.diffusion, P.extinction, P.TakeOver,
                myWorld, mytree, myT, multiplier, nbs, independent)

  # Functions order to be randomized
  rand_order_func_run <- list("Extinction", "Diffusion",
                              "SpeciationTakeOver.push", "Arisal")

  cat("0% [") # Time count

  for (steps in 1:N.steps) { # Starts the loop with 'n' steps

    if (steps %% round((N.steps / 10)) == 0) { # Time count
      cat('-') # Time count
    }# Time count
    if (steps == N.steps) { # Time count
      cat("] 100 %\n")# Time count
    }# Time count

    # Randomize functions order
    rand_order <- sample(rand_order_func_run)
    # Run the functions
    input <- do.call(rand_order[[1]], list(input = input))
    input <- do.call(rand_order[[2]], list(input = input))
    input <- do.call(rand_order[[3]], list(input = input))
    input <- do.call(rand_order[[4]], list(input = input))

  }
  # Trunsform the input/output into the final result and return it
  myWorld <- as.data.frame(input[[6]])
  myWorld[, 8] <- paste0("t", myWorld[, 8])
  mytree <- makePhy(input[[7]])
  mytree$edge.length <- mytree$edge.length / N.steps
  return(list('mytree' = mytree, 'myWorld' = myWorld))
}

```







































#### The second set of RunSim functions save an output each timestep if we want to look at trends through time. We use this to make videos of the simulation running. 
```{r}
RunSimUltimate2 <- function (myWorld, P.extinction, P.speciation, P.diffusion, P.Arisal, 
    P.TakeOver, nbs, independent, N.steps, multiplier, silent = TRUE, 
    count, resolution = seq(1, N.steps, 100), P.Arisal0, start = NULL) 
{
    result <- try(RunSim2(myWorld, P.extinction, P.speciation, 
        P.diffusion, P.Arisal, P.TakeOver, nbs, independent, 
        N.steps, multiplier, count = count, resolution = resolution, 
        P.Arisal0 = P.Arisal0, start), silent = silent)
    if (class(result) == "try-error") {
        result <- NA
    }
    return(result)
}
```


```{r}
RunSim2 <- function (myWorld, P.extinction, P.speciation, P.diffusion, P.Arisal, 
    P.TakeOver, nbs, independent, N.steps, multiplier, count, 
    resolution, P.Arisal0, start = NULL) 
{
    folder <- paste0("./Module_1_outputs/myOut_rep_", formatC(count, 
        width = 2, flag = 0), "_combo_", formatC(count, width = 2, 
        flag = 0), "_", "params", "_P.speciation_", paste(formatC(P.speciation, 
        width = 2, flag = 0), collapse = "_"), "_P.extinction_", 
        paste(formatC(P.extinction, width = 2, flag = 0), collapse = "_"), 
        "_P.diffusion_", paste(formatC(P.diffusion, width = 2, 
            flag = 0), collapse = "_"), "_P.TO_", paste(formatC(P.TakeOver, 
            width = 2, flag = 0), collapse = "_"), "_P.Arisal_", 
        paste(formatC(P.Arisal0, width = 2, flag = 0), collapse = "_"), 
        "_timesteps_", N.steps)
    world.size <- nrow(myWorld)
    rootnode <- world.size + 1
    if (is.null(start)) {
        start <- sample(1:world.size, 1)
    }
    myWorld[start, 4:6] <- c(0, 0, 1)
    mytree <- TheOriginOfSpecies(world.size, start)
    myT <- 0
    input <- list(P.speciation, P.Arisal, P.diffusion, P.extinction, 
        P.TakeOver, myWorld, mytree, myT, multiplier, nbs, independent)
    rand_order_func_run <- list("Extinction", "Diffusion", "SpeciationTakeOver", 
        "Arisal")
    cat("0% [")
    for (steps in 1:N.steps) {
        if (steps%%round((N.steps/10)) == 0) {
            cat("-")
        }
        if (steps == N.steps) {
            cat("] 100 %\n")
        }
        rand_order <- sample(rand_order_func_run)
        input <- do.call(rand_order[[1]], list(input = input))
        input <- do.call(rand_order[[2]], list(input = input))
        input <- do.call(rand_order[[3]], list(input = input))
        input <- do.call(rand_order[[4]], list(input = input))
        if (steps %in% resolution) {
            myWorld <- as.data.frame(input[[6]])
            myWorld[, 8] <- paste0("t", myWorld[, 8])
            if (nrow(na.omit(input[[7]])) > 1) {
                mytree <- makePhy(input[[7]])
            }
            else {
                mytree <- NA
            }
            myOut <- list(mytree = mytree, myWorld = myWorld)
            save(myOut, file = paste0(folder, "_", formatC(steps, 
                10, flag = 0), ".Rdata"))
            stats <- Module_2(myOut)
            save(stats, file = paste0(folder, "_", formatC(steps, 
                10, flag = 0), "_stats", ".Rdata"))
        }
    }
    myWorld <- as.data.frame(input[[6]])
    myWorld[, 8] <- paste0("t", myWorld[, 8])
    mytree <- makePhy(input[[7]])
    mytree$edge.length <- mytree$edge.length/N.steps
    return(list(mytree = mytree, myWorld = myWorld))
}
```


#..And the push version of saving each time step


```{r}
# Run the simulation function skiping the erros and atributing NA if it occurs
RunSimUltimate2.push <- function(myWorld, P.extinction, P.speciation,
                            P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                            N.steps, multiplier,
                            silent = TRUE, count, resolution = seq(1, N.steps, 100),
                            P.Arisal0, start = NULL) {

  result <- try(RunSim2.push(myWorld, P.extinction, P.speciation,
                        P.diffusion, P.Arisal, P.TakeOver, nbs,
                        independent, N.steps,
                        multiplier, count = count, resolution = resolution,
                        P.Arisal0 = P.Arisal0, start),
                silent = silent)
  if (class(result) == "try-error") {
    result <- NA
  }
  return(result)
}

```


```{r}
#==================================================================
RunSim2.push <- function(myWorld, P.extinction, P.speciation,
                    P.diffusion, P.Arisal, P.TakeOver, nbs, independent,
                    N.steps, multiplier, count, resolution, P.Arisal0,
                    start = NULL) {
  # myWorld = The hexagonal world created with the function BuildWorld
  # P.extinction = Probability matrix of extinction
  # P.speciation = Probability matrix of speciation
  # P.diffusion = Probability matrix of diffusion
  # P.Arisal = Probability matrix of arisal
  # P.TakeOver = Probability matrix of takeover
  # N.steps = Number of steps in the model
  # multiplier = The number that will multiply the probabilities according
  # to environmetal fitness.
  # start = the point ID in 'myWorld' that will give risen to humans.
  # (humans origin will be in one of the existing positions)
  folder <- paste0("./Module_1_outputs/myOut_rep_",
                   formatC(count, width = 2,flag = 0),
                   "_combo_",
                   formatC(count, width = 2,flag = 0),
                   "_","params", "_P.speciation_",
                   paste(formatC(P.speciation, width = 2,flag = 0),
                         collapse="_"),"_P.extinction_",
                   paste(formatC(P.extinction, width = 2,flag = 0),
                         collapse="_"), "_P.diffusion_",
                   paste(formatC(P.diffusion, width = 2,flag = 0),
                         collapse="_"), "_P.TO_",
                   paste(formatC(P.TakeOver, width = 2,flag = 0),
                         collapse="_"),"_P.Arisal_",
                   paste(formatC(P.Arisal0, width = 2,flag = 0),
                         collapse="_"), "_timesteps_",
                   N.steps)
  world.size <- nrow(myWorld)
  # Initialize parameters we will use later to build the phylogeny
  rootnode <-  world.size + 1 # standard convention for root node number

  # set the seed for simulation
  if (is.null(start)) {
    start <- sample(1:world.size, 1)
  }

  myWorld[start, 4:6] <- c(0, 0, 1) # Setting root(0), time(0), ancestral(1, forager)

  mytree <- TheOriginOfSpecies(world.size, start) # Empty tree
  myT <- 0 # Time starts at zero

  # Common input and output for all the internal modules
  input <- list(P.speciation, P.Arisal, P.diffusion, P.extinction, P.TakeOver,
                myWorld, mytree, myT, multiplier, nbs, independent)

  # Functions order to be randomized
  rand_order_func_run <- list("Extinction", "Diffusion",
                              "SpeciationTakeOver.push", "Arisal")

  cat("0% [") # Time count

  for (steps in 1:N.steps) { # Starts the loop with 'n' steps

    if (steps %% round((N.steps / 10)) == 0) { # Time count
      cat('-') # Time count
    }# Time count
    if (steps == N.steps) { # Time count
      cat("] 100 %\n")# Time count
    }# Time count

    # Randomize functions order
    rand_order <- sample(rand_order_func_run)
    # Run the functions
    input <- do.call(rand_order[[1]], list(input = input))
    input <- do.call(rand_order[[2]], list(input = input))
    input <- do.call(rand_order[[3]], list(input = input))
    input <- do.call(rand_order[[4]], list(input = input))
    # Save
    if(steps %in% resolution) {
      myWorld <- as.data.frame(input[[6]])
      myWorld[, 8] <- paste0("t", myWorld[, 8])
      if(nrow(na.omit(input[[7]])) > 1) {
        mytree <- makePhy(input[[7]])
      } else {
        mytree <- NA
      }
      myOut <- list('mytree' = mytree, 'myWorld' = myWorld)
      save(myOut, file= paste0(folder,"_", formatC(steps, 10, flag = 0), ".Rdata"))
      stats <- Module_2(myOut)
      save(stats, file= paste0(folder,"_", formatC(steps, 10, flag = 0),
                               "_stats", ".Rdata"))
    }
  }
  # Trunsform the input/output into the final result and return it
  myWorld <- as.data.frame(input[[6]])
  myWorld[, 8] <- paste0("t", myWorld[, 8])
  mytree <- makePhy(input[[7]])
  mytree$edge.length <- mytree$edge.length / N.steps
  return(list('mytree' = mytree, 'myWorld' = myWorld))
}


```






<!--chapter:end:Module_1_markdown.Rmd-->

---
title: "D-place FARM documentation: Module 2"
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
output:
  html_notebook: default
  html_document: default
  pdf_document: default
  word_document: default
bibliography: FARM package.bib
---


# Module 2: Space and phylogeny summary statistics 
  This is the master document for Module 2, a foundational function in our FARM package that analyzes results from Module 1, the other foundational function. Module 1 simulates a spatial pattern and a phylogenetic tree given a set of environmental and inheritance rules and then Module 2 summarizes those simulated results using a large set of targeted summary statistics. Here we describe our choice of summary statistics, justify those choices as part of a larger theoretical context, and provide our reproducable code for executing the anaylses yourself. These two parts are seperated into modules so that they can act independently. An combination of spatial pattern and associated phylogeny many be used as long as they are formatted correctly.         

This pipeline was designed to analyze a simulated world where all the information is known about both the world and the tree. There is no missing information, just extinct trees. This is much different than our real tree that has loads of uncertainty unevenly destributed across it. The result you see demonstrated right now are one simulated result of many. I need to do a sister page to this were we do this entire analysis on the real tree, or best real tree we've got. 

We have four types of data available for asking research questions using D-place data: [phylogenies](#phylogenetic-summary-statistics), [spatial locations](#spatial-locations), trait identities, and environmental reconstructions. Any one of these four data types alone are relatively information poor, so we are searching for ways to model connections between these data types to draw stronger conclusions overall. 

Other modules can use the summary statistics generated from this module to test hypotheses. We currently have a ABC and Random Forest module started but there will be more to come. 

These are quantitative connections that we are assumed in the analyses, but we don't actually have any support in the data for doing so. 
1. nearest neighbor connectivity measures
2. Abundance estimates
3. Pairwise influence (history) between cultures. 
4. Environmental reconstruction validation evidence 

# Phylogenetic summary statistics  

Whole tree vs. part of tree? These statistics are generally used to compare one sample to another. For example, an experimental contrast between two sites, two phylogenetic groups, or two communities in two different locations. Here we are calculating these statistics for the global langauage tree to compare against global trees created in our simulation. You still retain the ability to subset this tree or others and send only those subsets through this code to compare the values with each other afterwards. 

* Introduction and framework
* [Alpha Diversity metrics](#alpha-diversity-metrics) 
  1. [Branch Length (richness and divergence)](#branch-lengths)
  2. [Pairwise distance between tips (richness, divergence, and regularity)](#pairwise-distance-between-tips)
  3. [Phylogenetic isolation (divergence, and regularity)](#phylogenetic-isolation)
  4. [Nearest Neighbor (divergence, and regularity)](#nearest-phylogenetic-neighbor) 
* [Beta Diversity](#beta-diversity)
* [Tree topology](#tree-topology)
* [Macroevolutionary rates](#macroevolutionary-rates)


All trees are ultrametric.



## Introduction and framework


  The choice of phylogenetic analyses and organizational scheme is based on the suggestions of @Tucker2016. Here are a few images from that paper for an overview:  


![From @Tucker2016](Images/Tucker_2016/tree_concept.jpg)
![From @Tucker2016](Images/Tucker_2016/alpha_PCA.jpg)

![From @Tucker2016](Images/Tucker_2016/Three_type_summary.jpg)


![From @Tucker2016](Images/Tucker_2016/tutorial.jpg)

```{r}
library(knitr)
library(phytools)
library(FARM)
library(ROCR)
library(spdep)
```



```{r}
load('~/Downloads/download.Rdata')

this_tree <- myOut$mytree
this_world <- myOut$myWorld
```

```{r}
str(this_world)
str(this_tree)
```


##Alpha diversity metrics

### Branch Lengths
  Branch length data is embedded in the tree object provided to this function. The first step in summarizing the lengths is to extract those data from the tree object. These data are called 'edges' in the tree object. We extract branch lengths and create an object called 'Branch_lengths' for passing on to the other summary functions. The histogram below shows the frequency of different branch lengths found throughout the tree. 
  
```{r}
Branch_Lengths <- this_tree$edge.length
```

```{r, echo=FALSE}
 # plot the branch lengths
hist(Branch_Lengths, xlab="Branch Lengths", main="", breaks=1000, border=NA, 
     col=adjustcolor("cornflowerblue", alpha=0.9))
```
We can summarize branch lengths according to normal summary statistics, but it can be difficult to assign evolutionary meaning to some of these metrics and so they are not regularly used as best I can tell. This lack of meaning does not mean that these statistics couldn't be used to distinguish between large simulated trees. 
```{r}
mean_branch_length <- mean(Branch_Lengths)
variance_branch_length <- var(Branch_Lengths)
SD_branch_length <- sd(Branch_Lengths)
```


```{r, echo=FALSE}
paste("mean branch length = ", mean_branch_length)
paste("variance in branch lengths = ", variance_branch_length)
paste("standard deviation in branch lengths = ", SD_branch_length)

```

```{r, echo=FALSE}
par(mar=c(2,4,0,0))
boxplot(Branch_Lengths, col=adjustcolor("cornflowerblue", alpha=0.5), 
        border=adjustcolor("cornflowerblue", alpha=1), ylab="Branch Lengths")
```

#### Phylogenetic diversity ($PD$)
Phylogenetic diversity ($PD$) is the summation ($\sum$) of all branch lengths connecting species together, where $B_{t}$ is the set of included tips and $L_{b}$ is Branch lengths [@Faith1992]. This is an anchor test, which means it is regularly used, well understood, and we should use it to anchor our work to past work. PD is a richness measure, it tells us how much evolutionary history is associated with a set of tips.  

$$PD = \sum_{b \in B_{t}}^{}L_{b}$$
 
```{r}
# Anchor test = PD (Faith's phylogenetic diversity)
Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)
Pylo_diversity_is_sum_of_BL
```


 There are variations on this measure that we have NOT implemented here. It is popular to scale this measure according to some ecological driver. @Barker2002 scales branch lengths ($L_{b}$) by multiplying them against the abundance of individuals at at tip ($A_{b}$). Others [@Rosauer2009], scale them by their range size instead ($R_{b}$). 
 
 $$\Delta n PD = \sum_{b \in B_{t}}^{}A_{b}L_{b}$$
$$PE = \sum_{b \in B_{t}}^{}\dfrac{L_{b}}{R_{b}}$$
Argueing that proportional abundance phylogenetic diversity ($PD_{Ab}$) is more effective than the standard PD calculated from raw abundance, @Vellend2011 penned a new version of PD where $B$ is the total number of branch lengths ($L_{b}$). Note: We don't have abundance data right now for the human project so this metric is not currently very helpful.   
$$PD_{Ab} = B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}$$

```{r}
#Calculate B
number_of_branches <- length(Branch_Lengths)
number_of_branches
```

#### Average phylogenetic diversity ($avPD$)
Average phylogenetic diversity ($avPD$) [@Clarke2001] is a branch length-based divergence indices where PD is divided by the total number of tips ($S$) in the tree. 
$$avPD = \dfrac{PD}{S}$$
```{r}
Number_of_tips <- length(this_tree$tip.label)
average_phylogenetic_diversity <- Pylo_diversity_is_sum_of_BL / Number_of_tips
average_phylogenetic_diversity
```


There is also a proportional abundance version of average phylogenetic diversity ($avPD_{Ab}$) [@Tucker2016]. Again, we don't have abundance values yet for D-place. 
$$avPD_{Ab} = \dfrac{B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}}{S}$$


### Pairwise distance between tips

This is the patristic distance, the sum of the branch lengths following the shortest distance between two tips in a tree, implemented as a distance matrix where every tip is compared to every other tip. This distance function can be anything. We use euclidean and environmental distance matrices heavily in the spatial analyses. 

#### Calculate the patristic distance between two taxa, for all taxa
Calculate the patristic distance between two taxa using the R package 'phytools', this function takes a 'phylo' tree object and returns a distance matrix between tips. Need original citation.

```{r}
## Pairwise distance between tips - From library(ape) in library(phytools)
Pairwise_dist <- cophenetic(this_tree)
```
yields a distance matrix (list of 2D matrices) of all distances between taxa.


```{r, echo=FALSE}
str(Pairwise_dist)
```

#### Sum of all pairwise distances ($F$)
Now we can use a set of summary statistics to describe those pairwise distances. The sum of all pairwise distances, $F$, is formally called 'Extensive quadratic entropy'. [@Izsak2000]. Just as it was with branch lengths, this is a richness measure and, accordingly, should be used to answer richness questions. 

$$F = \sum_{i} \sum_{j} d_{ij}$$
```{r}
# F -- Extensive quadratic entropy
F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)
F_quadratic_entropy_is_sum_of_PD
```

#### Mean pairwise distance (MPD)
Mean inter-species distances. The mean of all pairwise distances, $MPD$ (a.k.a. $AvTD$, and $\Delta^{+}$), is the mean distance between species. [@Clarke1998; @Webb2002; @Webb2008; @Kembel2010].
$$MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}$$

```{r}
# Anchor test = MPD (mean pairwise distance)
Mean_pairwise_distance <- 
  Pairwise_dist / (Number_of_tips * (Number_of_tips - 1) ) 
```

```{r, echo=FALSE}
str(Mean_pairwise_distance)
```

#### MPD anchored to the root 
There is an extention to mean pairwise distance calculations from @Helmus2010 called $PSV$, $PSR$, and $PSE$, phylogenetic species variability, phylogenetic species richness, and phylogenetic species evenness. These measures take the basic pairwise distance calculations and anchor them to the root of the tree so distances have a common denominator. This extention is implemented by using the same equations, just with a constrained set of $d_{ij}$ conditions. Specifically,




$$PSV = MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}$$
$$PSR = \sum_{i} {(\dfrac{1}{S-1} \sum_{j} {d_{ij}})}$$
$$PSE = \dfrac{S}{S-1} \sum_{ij} d_{ij}p_{i}p_{j}$$


with these specific values of $d_{ij}$

$$
d_{ji}=0.5*(c_{ii} + c_{jj} - c_{ij}) \\ 
\ \\
or \\
\ \\
d_{ij} = 1 - c_{ij} / (\sqrt{c_{ii}c_{jj}}) 
$$
and
$$
c_{ii} = the \ sum \ of \ branch \ lengths \ from \ tip \ i  \ to \ the \ root \ of \ the \ phylogenetic \ tree. \\
\ \\
c_{ij} = the \ sum \  of \ branch \ lengths \ from \ first \ common \ ancestor \ for \ i \ and \ j \ to \ the \ root.
$$



#### Average distance between two randomly chosen species
$J$, Intensive quadratic entropy, which is the average distance between two randomly chosen species [@Izsak2000]
$$J = \dfrac{\sum_{ij}d_{ij}}{S^2} $$




#### Simpson's diversity index for pairwise distance
There has been a long effort to pen a phylogenetic analogy to a Simpson's diversity index. [@Rao1982; @Clarke1998; @Pavoine2005; @Hardy2007; @Webb2002; @Webb2008; @Kembel2010]. The conclusion seems to be that this measure is equivilent to scaling $MPD$ by abundance $p_{i}$ and $p_{j}$ to get $MPD_{Ab}$. This is also a special case of Rao's Quadratic Entropy, $Roe's QE$. Note: not using abundance measures yet for D-place data. 

$$MPD_{Ab} = \sum_{i} \sum_{j} d_{ij} p_{i} p_{j}$$




#### Interspecific comparisons of pairwise distances
The interspecific variant (rather than the intraspecific default described above) defines the expected phylogenetic distance between two indivdiuals randomely drawn conditionally on the fact that they indivdulas from different species. 
$$InterMPD_{Ab} = \dfrac{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j} }{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j}}  $$



#### Variance in pairwise distances ($VPD$)
Variance in pairwise distances, $VPD$ (a.k.a. $VarTD$ and $\Lambda^+$), is a regularity indices. @Clarke2001 Variance is relative to tips, $S$, not to total branches ($B$ from above). These are the residuals, they compare each individual pairwise connection to the overall mean. 

$$VPD = \dfrac{1}{S(S-1)} (\sum_{i} \sum_{j \ne i} {(d_{ij} - MPD)^2})$$




```{r}
#need to adjust to equation above!

#Pairwise distance/all distances -- Variance of pairwise distances

# Anchor test = VPD (variation of pairwise distance)  

variance_pairwise_distance <- var(as.vector(Pairwise_dist))
```


Variants of $VPD$ are $VPD_{ab}$ and $InterPVD_{Ab}$, where variance is scaled by abundance or compared in and out of species. These are also regularity indices. 

$$
VPD_{Ab}= 
(\sum_{i} \sum_{j} n_{i} n_{j}) *
\dfrac{\sum_{i} \sum_{j} n_{i} n_{j} (d_{ij} - MPD_{Ab})^2}
{(\sum_{i} \sum_{j} n_{i} n_{j})^2 - \sum_{i} \sum_{j} (n_{i} n_{j})^2}
\\
or
\\
InterVPD_{Ab} = 
(\sum_{i} \sum_{j \ne i} n_{i} n_{j}) *
\dfrac{\sum_{i} \sum_{j \ne i} n_{i} n_{j} (d_{ij} - InterMPD_{Ab})^2}
{(\sum_{i} \sum_{j \ne i} n_{i} n_{j})^2 - \sum_{i} \sum_{j \ne i} (n_{i} n_{j})^2}
$$

### Nearest phylogenetic neighbor

#### Divergence indices
Divergence indices using nearest distance: $MNTD$ and $MNTD_{Ab}$, Mean nearest taxon distance and Abundance-weighted MNTD [@Webb2002; @Webb2008; @Kembel2010]. 

$MNTD$, mean nearest taxon distance, is the mean shortest distance from a species to all other in the assemblage [@Webb2002; @Webb2008; @Kembel2010].  
$$
MNTD = 
\dfrac{1}{S}
\sum_{i}
d_{i_{min}}
$$

$MNTD_{AB}$, abundance adjusted mean nearest taxon distance. Adjusted by species proportions (i.e. species' relative abundances) [@Webb2002; @Webb2008; @Kembel2010]   
$$
MNTD_{Ab} = 
\sum_{i=1}^{S}
[d_{i_{min}} * p_{i}]
$$

#### Regularity indices
Regularity indices using nearest distances: $VNTD$, $VNTD_{Ab}$, $PE_{ev}$.  

$VNTD$, Variance in nearest taxon distances, is the variance in nearest pairwise distance [@Tucker2016].
$$
VNTD = \dfrac{1}{S}
\sum_{i-1}^{S}
[(d_{i_{min}} - MNTD)^2]
$$
$VNTD_{Ab}$, Abundance weighted variance in nearest taxon distances, is scales by abundance in the same way as descried above [@Tucker2016].
$$
VNTD_{Ab} = \dfrac
{(\sum_{i} n_{i}) \sum_{i} n_{i} (d_{i_{min}} - MNTD_{Ab})^2}
{(\sum_{i} n_{i})^2 - \sum_{i} n_{i} ^2}
$$

#### Phylogenetic version of the funtional $FE_{ve}$ index
$PE_{ve}$, phylogenetic evenness is a phylogenetic version of the funtional $FE_{ve}$ index. First a minimum spanning tree ($MST$) is computed using the cophenetic distance obtained from the phylogenetic tree. The $MST$ contains $S-1$ Branches connection the $S$ species. We denote $l$ a branch on the $MST$, $dist(i,j)$ is the length the branch $l$ that connects species $i$ and $j$. $n_{i}$ is, as defined above, the abundance of species $i$ in the asseblage [@Villeger2008; @Dehling2014]. 

$$
Weighted \ evenness: \\
EW_{i} = \dfrac{dist(i,j)}
{(n_{i} + n_{j})/(\sum_{k=1}^{S}n_{k})} \\
\ \\
Partial \ weighted \ evenness: \\ 
PEW_{l} = \dfrac
{EW_{l}}
{\sum_{l=1}^{S-1} EW_{l}} \\
\ \\
Phylogenetic \ evenness: \\
PE_{ve} = \dfrac
{\sum_{l=1}^{S-1} min(PEW_{l}, \dfrac{1}{S-1}) - (\dfrac{1}{S-1})}
{1- (\dfrac{1}{S-1})}
$$

### Phylogenetic isolation
A phylogenetic isolation index represents the relative isolation of a given species within a phylogenetic tree. Several indices have been proposed so far but we focus here on the evolutionary distinctiveness index called ‘Fair Proportion’ as proposed by @Redding2003 and @Isaac2007. 

#### Evolutionary distinctiveness (richness indices)
$ED$, evolutionary distinctiveness is a richness indices. NOTE: not equal to Faith's PD because the $ED_{i}$ are computed from the regional pool of species and sumed across a given assemblage (i.e. a subset of the regional species pool) [@Tucker2016; @Safi2013a; @Redding2003; @Isaac2007]. 

$$
  ED = \sum_{i}ED_{i} \\
  \ \\ 
  where \ ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}} 
$$

$AED$, Abundance-weighted $ED$ [@Tucker2016; @Cadotte2010]. 
$$
\sum_{i} AED_{i} \\
\ \\
where \ AED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{A_{b} }* p_{i}
$$



```{r}

# Bruno's function for ED. Provided in library(FARM)

evol.distinct2 <- function (tree, type = c("equal.splits", "fair.proportion"), 
    scale = FALSE, use.branch.lengths = TRUE) 
{
    type <- match.arg(type)
    if (is.rooted(tree) == FALSE) 
        warning("A rooted phylogeny is required for meaningful output of this function", 
            call. = FALSE)
    if (scale == TRUE) {
        if (is.ultrametric(tree) == TRUE) 
            tree$edge.length <- tree$edge.length/(as.numeric(branching.times(tree)[1]))
        else tree$edge.length <- tree$edge.length/sum(tree$edge.length)
    }
    if (use.branch.lengths == FALSE) 
        tree$edge.length <- rep(1, length(tree$edge.length))
    for (i in 1:length(tree$tip.label)) {
        spp <- tree$tip.label[i]
        nodes <- .get.nodes(tree, spp)
        nodes <- nodes[1:(length(nodes) - 1)]
        internal.brlen <- tree$edge.length[which(tree$edge[, 
            2] %in% nodes)]
        if (length(internal.brlen) != 0) {
            internal.brlen <- internal.brlen * switch(type, equal.splits = sort(rep(0.5, 
                length(internal.brlen))^c(1:length(internal.brlen))), 
                fair.proportion = {
                  for (j in 1:length(nodes)) {
                    sons <- .node.desc(tree, nodes[j])
                    n.descendents <- length(sons$tips)
                    if (j == 1) portion <- n.descendents else portion <- c(n.descendents, 
                      portion)
                  }
                  1/portion
                })
        }
        ED <- sum(internal.brlen, tree$edge.length[which.edge(tree, 
            spp)])
        if (i == 1) 
            w <- ED
        else w <- c(w, ED)
    }
    return(w)
}

```

Evolutionary distinctiveness is our basic measure of phylogenetic isolation. #This should likely be 'fair proportions' instead of 'equal.splits'.


```{r}

# Calculate ED
# Using equal.splits method, faster computation
# Evolutionary_distinctiveness_i <- evol.distinct2(this_tree, type = "equal.splits")  

# ED - Summed evolutionary distinctiveness
# Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness_i)
```


```{r}
#Evolutionary_distinctiveness_sum
```



We can run some standard summary statistics (mean and variance) on this ED measure. var(Ed) shows up close to VPD on the PCAs in the intro [@Tucker2016].

```{r}
# mean(ED)
# mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness_i)

# var(ED)
#variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness_i)
```



```{r}
#mean_Phylogenetic_isolation
#variance_Phylogenetic_isolation
```

#### Mean evolutionary distinctiveness (divergence indices)
The divergence indices version for $ED$ is mean evolutionary distinctiveness, $MED$. The mean of evolutionary distinctiveness [@Redding2003; @Isaac2007]. 
$$
MED = \dfrac
{\sum_{i} ED_{i}}
{S} \\
\ \\
with \\
\ \\
ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}}
$$
#### Entropy measure of evolutonary distinctiveness (regularity indices)
The regularity indices for $ED$/phylogenetic isolation are $H_{ED}$, $E_{ED}$, $var(ED)$, $H_{AED}$

$H_{ED}$, Entropy measure of evolutionary distinctiveness, is the shannon index applied to evolutionary distinctiveness values [@Cadotte2010]. 
$$
H_{ED} = -\sum_{i=1}^{S}
((\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}})
* \ln (\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}}))
$$

$E_{ED}$, Equitability of evolutionary distinctiveness, is $H_{ED}$ controlled for species richness [@Cadotte2010]. 

$$
E_{ED} = \dfrac{H_{ED}}{\ln(S)}
$$
$var(ED)$, Variance in evolutinoary distinctiveness, is the variance of species evolutionary distinctiveness [@Tucker2016]. 

$$
var(ED) = \dfrac{1}{S-1} *
\sum_{i=1}^{S}
(ED_{i}-\dfrac{\sum_{i=1}^{S} ED_{i}}{S})^2
$$
$H_{ED_{Ab}}$, Abundance-weighted version of $H_{ED}$ [@Cadotte2010].

$$
H_{ED_{Ab}} = -\sum_{i=1}^{S}
(\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}} *
\ln(\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}}))
$$


## Beta diversity
We currently are not using any beta diversity metrics but there are many to choose from if we decide to add them later. 



## Tree topology
Tree topology is a measure of the shape of the overall tree. The tree can be lopsided side-to-side or front-to-back. 

Our most trusted index for the tippy vs trunky of a tree is the gamma index, $\gamma$.The index characterizes the distribution of branching events within the tree. Trees with γ < 0 have relatively longer branches towards the tips of the phylogeny (tippy trees), whereas trees with γ > 0 have relatively longer inter-nodal distances towards the root of the phylogeny (stemmy trees).  tk represents an ‘evolutionary period’ (limits are given by two speciation events) or equivalently an internode distance [@Pybus2000]. 

$$
  \gamma = \dfrac
  {(\dfrac{1}{S-2}* \sum_{i=2}^{S-1} (\sum_{k=2}^{i} Kt_{k}))- \dfrac{1}{2} * \sum_{j=2}^{S} jt_{j}}
  {(\sum_{j=2}^{S} jt_{j}) * \sqrt{\dfrac{1}{12*(S-2)}}}
$$


```{r}
# ltt function from library(phytools)
ltt <- function (tree, plot = TRUE, drop.extinct = FALSE, log.lineages = TRUE, 
    gamma = TRUE, ...) 
{
    tol <- 1e-06
    if (!inherits(tree, "phylo") && !inherits(tree, "multiPhylo")) 
        stop("tree must be object of class \"phylo\" or \"multiPhylo\".")
    if (inherits(tree, "multiPhylo")) {
        obj <- lapply(tree, ltt, plot = FALSE, drop.extinct = drop.extinct, 
            log.lineages = log.lineages, gamma = gamma)
        class(obj) <- "multiLtt"
    }
    else {
        tree <- reorder.phylo(tree, order = "cladewise")
        if (!is.null(tree$node.label)) {
            node.names <- setNames(tree$node.label, 1:tree$Nnode + 
                Ntip(tree))
            tree$node.label <- NULL
        }
        else node.names <- NULL
        if (is.ultrametric(tree)) {
            h <- max(nodeHeights(tree))
            time <- c(0, h - sort(branching.times(tree), decreasing = TRUE), 
                h)
            nodes <- as.numeric(names(time)[2:(length(time) - 
                1)])
            ltt <- c(cumsum(c(1, sapply(nodes, function(x, y) sum(y == 
                x) - 1, y = tree$edge[, 1]))), length(tree$tip.label))
            names(ltt) <- names(time)
        }
        else {
            drop.extinct.tips <- function(phy) {
                temp <- diag(vcv(phy))
                if (length(temp[temp < (max(temp) - tol)]) > 
                  0) 
                  pruned.phy <- drop.tip(phy, names(temp[temp < 
                    (max(temp) - tol)]))
                else pruned.phy <- phy
                return(pruned.phy)
            }
            if (drop.extinct == TRUE) 
                tree <- drop.extinct.tips(tree)
            root <- length(tree$tip) + 1
            node.height <- matrix(NA, nrow(tree$edge), 2)
            for (i in 1:nrow(tree$edge)) {
                if (tree$edge[i, 1] == root) {
                  node.height[i, 1] <- 0
                  node.height[i, 2] <- tree$edge.length[i]
                }
                else {
                  node.height[i, 1] <- node.height[match(tree$edge[i, 
                    1], tree$edge[, 2]), 2]
                  node.height[i, 2] <- node.height[i, 1] + tree$edge.length[i]
                }
            }
            ltt <- vector()
            tree.length <- max(node.height)
            n.extinct <- sum(node.height[tree$edge[, 2] <= length(tree$tip), 
                2] < (tree.length - tol))
            node.height[tree$edge[, 2] <= length(tree$tip), 2] <- node.height[tree$edge[, 
                2] <= length(tree$tip), 2] + 1.1 * tol
            time <- c(0, node.height[, 2])
            names(time) <- as.character(c(root, tree$edge[, 2]))
            temp <- vector()
            time <- time[order(time)]
            time <- time[1:(tree$Nnode + n.extinct + 1)]
            for (i in 1:(length(time) - 1)) {
                ltt[i] <- 0
                for (j in 1:nrow(node.height)) ltt[i] <- ltt[i] + 
                  (time[i] >= (node.height[j, 1] - tol) && time[i] <= 
                    (node.height[j, 2] - tol))
            }
            ltt[i + 1] <- 0
            for (j in 1:nrow(node.height)) ltt[i + 1] <- ltt[i + 
                1] + (time[i + 1] <= (node.height[j, 2] + tol))
            names(ltt) <- names(time)
            ltt <- c(1, ltt)
            time <- c(0, time)
            time[length(time)] <- time[length(time)] - 1.1 * 
                tol
        }
        if (!is.null(node.names)) {
            nn <- sapply(names(time), function(x, y) if (any(names(y) == 
                x)) 
                y[which(names(y) == x)]
            else "", y = node.names)
            names(ltt) <- names(time) <- nn
        }
        if (gamma == FALSE) {
            obj <- list(ltt = ltt, times = time, tree = tree)
            class(obj) <- "ltt"
        }
        else {
            gam <- gammatest(list(ltt = ltt, times = time))
            obj <- list(ltt = ltt, times = time, gamma = gam$gamma, 
                p = gam$p, tree = tree)
            class(obj) <- "ltt"
        }
    }
    if (plot) 
        plot(obj, log.lineages = log.lineages, ...)
    obj
}
<environment: namespace:phytools>
```



```{r}
ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
ltts
str(ltts)
```




```{r}
lineages_through_time <- as.numeric(ltts[[1]])
time_steps <- as.numeric(ltts[[2]])
#extract Gamma index
gamma <- ltts[[3]]
gamma_p_value <- ltts[[4]]
```


```{r}
lineages_through_time 
time_steps 
gamma 
gamma_p_value 
```


There are two other regularly used metrics that include abundance measures. Note: we don't have abundance measures for D-place data. 

$IAC$, imbalance of abundance at the clade level, quantifies the relative deviation in the abundance distribution from a null case where individuals are evenly partitioned between clade splits. $v$ is the number of nodes in the phylogenetic tree.  $n_{i}$ is, as defined above, the abundance of species $i$ in the assemblage.  $\eta_{k}$ is the expected abundance species $i$ would have if the abundance was randomly split among lineages in the phylogenetic tree at each speciation event.  is the number of lineages originating at node $k$ in the set $s(k,root)$, which contains the nodes located on the path between node $k$ and the root of the phylogenetic tree. N is the total assemblage abundance [@Cadotte2010]. 

$$
\dfrac{\sum_{i=1}^{S}  |n_{i} - \hat{n_{i}}|}
{v} \\
\ \\
where \\ 
\ \\
\hat{n_{i}} = \dfrac{N}{\prod_{K \in s(i, root)}\eta_{k}}
$$

$I_{c}$, the Colless index, is the sum of the absolute differences in species richness between sister-clades at each internal node. For fully resolved trees, each internal node defines two sister-clades. $S_{1k}$ is the number of species descending from the first clade defined by node k and $S_{2k}$ that of the second clade. $v$ is, as defined above, the number of nodes in the phylogenetic [@Colless1982]. 

$$
I_{c} = \sum_{k=1}^{v} |S_{1k} - S_{2k}|
$$




## Macroevolutionary rates

```{r}

#function name = bd, function input = tree of type 'phylo'
bd <- function (tree) 
{
    tree$edge.length <- tree$edge.length/max(tree$edge.length)
    x <- birthdeath(tree)
    b <- x$para[2]/(1 - x$para[1])
    d <- b - x$para[2]
    c(setNames(c(b, d), c("b", "d")), x$para)
}
<environment: namespace:FARM>

```



```{r}
 ## Speciation vs extinction rates and Net diversification
bds <- bd(this_tree)
speciation_rate <- bds[1]
extinction_rate <- bds[2]
extinction_per_speciation <- bds[3]
speciation_minus_extinction <- bds[4]
  
```


```{r}
## Speciation vs extinction rates and Net diversification dependent on trait
# N.for.dom <- table(this_world[, 6])
#    if(length(N.for.dom) == 2) {
par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
trait_1_speciation <- par.div.dep[1]
trait_2_speciation <- par.div.dep[2]
trait_1_extinction <- par.div.dep[3]
trait_2_extinction <- par.div.dep[4]
transition_from_trait_1_to_2 <- par.div.dep[5]
transition_from_trait_2_to_1 <- par.div.dep[6]
transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
      
```




```{r}

## Crown age per trait AUC and effect size
tip.length <- this_tree$edge.length[this_tree$edge[, 2] %in% 1:Ntip(this_tree)]
tip.length <- (tip.length - min(tip.length)) / (max(tip.length) - min(tip.length))
this_trait <- this_world[match(this_tree$tip.label, this_world[, 8]), 6]
tip.length.2 <- tip.length[this_trait == 2]
tip.length.1 <- tip.length[this_trait == 1]
model <- glm(as.factor(this_trait) ~ log(tip.length + 1),
             family = "binomial")
effect.size <- model$coefficients[2]
# plot(y = this_trait - 1, x= log(tip.length))
p <- predict(model, as.factor(this_trait), type = "resp")
# points(y = p, x = log(tip.length), col = "red")
pr <- prediction(p, as.factor(this_trait))
auc.model <- performance(pr, measure = "auc")@y.values[[1]]

```



```{r}
## Phylogenetic signal (D)
Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)

```


# Spatial Locations

```{r}

## Spatial Analysis
nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
nbs.listw <- nb2listw(nbs)
factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
spatial.tests.fora <- spatial.tests[[1]]$statistic
spatial.tests.dom <- spatial.tests[[2]]$statistic
#prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
```


```{r}
results_summary_matrix_1 <- cbind(

        number_of_branches,
        #Pylo_diversity_is_sum_of_BL,
        #average_phylogenetic_diversity_is_mean_of_BL,
        #variance_Pylo_diversity_is_variance_of_BL,

        F_quadratic_entropy_is_sum_of_PD,
        Mean_pairwise_distance,
        variance_pairwise_distance,

        #Evolutionary_distinctiveness_sum,
        #mean_Phylogenetic_isolation,
        #variance_Phylogenetic_isolation,

        gamma,
        gamma_p_value,
        speciation_rate,
        extinction_rate,
        extinction_per_speciation,
        speciation_minus_extinction,
        trait_1_speciation,
        trait_2_speciation ,
        trait_1_extinction ,
        trait_2_extinction ,
        transition_from_trait_1_to_2 ,
        transition_from_trait_2_to_1 ,
        transition_rate_ratio_1to2_over_2to1 ,
        Phylogenetic_signal,
        spatial.tests.fora,
        spatial.tests.dom,
       # prevalence,
       # auc.model,
        effect.size
      )
      #rownames(results_summary_matrix_1) <- 1

      #results_summary_matrix_2 <- cbind(
      #  c(Evolutionary_distinctiveness,NA),
      #  lineages_through_time,
      #  time_steps
      #)
      #colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness", "lineages_through_time", "time_steps")
      #head(results_summary_matrix_2)

      ### Returns from function in list form
      #returns <- list(
        #Branch_Lengths,
        #Pairwise_dist,
      #  results_summary_matrix_1,
      #  results_summary_matrix_2

      #)

      #names(returns) <- c(
        #"Branch_Lengths",
        #"Pairwise_distance",
       # "results_summary_of_single_value_outputs",
       # "results_summary_matrix_of_multi_value_outputs"
      #)
      
      
```



## Module2() returns these two matrices as a list 



```{r, echo=FALSE}
#kable(returns$results_summary_of_single_value_outputs, caption= "This is our world")
     
```


```{r, echo=FALSE}
 #kable(returns$results_summary_matrix_of_multi_value_outputs, caption= "This is our world")
```


# Here is the exact version in R
```{r}
## This module analyzes the results from module 1 and returns a list based on how many values each stat returns
## Ty Tuff and Bruno Vilela
## 24 August 2016

###### Specify function ##############################

Module_2 <- function(Module_1_output) {
  cat("\nAnalyzing: 0% [")
  if (any(is.na(Module_1_output))) {
    cat("----------]")
    return(NA)
  } else {

    this_tree <- Module_1_output$mytree
    this_world <- Module_1_output$myWorld


    ##### (0) Pull necessary variables from simulated trees and organize into a single object for all the tests below to pull from.

    #str(all_trees)
    #str(this_tree)


    ## 0a) Branch lengths
    Branch_Lengths <- this_tree$edge.length
    number_of_branches <- length(Branch_Lengths)

    # Anchor test = PD (Faith's phylogenetic diversity)
    Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)

    # avPD -- Average phylogenetic diversity
    average_phylogenetic_diversity_is_mean_of_BL <- mean(Branch_Lengths)

    variance_Pylo_diversity_is_variance_of_BL <- var(Branch_Lengths)
    cat("-")
    ## 0b) Pairwise distance between tips
    Pairwise_dist <- cophenetic.phylo(this_tree)
    cat("-")
    # 2b) Pairwise distance -- Sum of pairwise distances

    # F -- Extensive quadratic entropy
    F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)

    #Mean inter-species distances

    # Anchor test = MPD (mean pairwise distance)

    Mean_pairwise_distance <- mean(Pairwise_dist)

    cat("-")
    #Pairwise distance/all distances -- Variance of pairwise distances

    # Anchor test = VPD (variation of pairwise distance)

    variance_pairwise_distance <- var(as.vector(Pairwise_dist))




    ## 0c) Phylogenetic isolation

    # Using equal.splits method, faster computation
    Evolutionary_distinctiveness <- evol.distinct2(this_tree, type = "fair.proportion")
    cat("-")
    # ED - Summed evolutionary distinctiveness

    Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness)

    ## 3d) Phylogenetic isolation -- Mean of species evolutionary distinctiveness

    # mean(ED)

    mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness)

    ## 4d) Phylogenetic isolation -- Variance of species isolation metrics

    #var(ED)

    variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness)
    cat("-")

    ## Tree topology

    #Gamma index

    ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
    lineages_through_time <- as.numeric(ltts[[1]])
    time_steps <- as.numeric(ltts[[2]])
    gamma <- ltts[[3]]
    gamma_p_value <- ltts[[4]]
    cat("-")

    colless_stat <- colless(as.treeshape(this_tree))
    sackin_index <- sackin(as.treeshape(this_tree))
    tree_shape_stat <- shape.statistic(as.treeshape(this_tree))

    ##### (5) Tree metric -- Macroevolutionary - Rate and rate changes ###############
    ##################################################

    ## Speciation vs extinction rates and Net diversification
    bds <- bd(this_tree)
    speciation_rate <- bds[1]
    extinction_rate <- bds[2]
    extinction_per_speciation <- bds[3]
    speciation_minus_extinction <- bds[4]
    cat("-")


    ## Speciation vs extinction rates and Net diversification dependent on trait
    N.for.dom <- table(this_world[, 6])
    if(length(N.for.dom) == 2) {
      par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
      trait_1_speciation <- par.div.dep[1]
      trait_2_speciation <- par.div.dep[2]
      trait_1_extinction <- par.div.dep[3]
      trait_2_extinction <- par.div.dep[4]
      transition_from_trait_1_to_2 <- par.div.dep[5]
      transition_from_trait_2_to_1 <- par.div.dep[6]
      transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
      cat("-")

      ## Phylogenetic signal (D)
      Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)
      cat("-")

      ## Spatial Analysis
      nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
      nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
      nbs.listw <- nb2listw(nbs)
      factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
      spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
      spatial.tests.fora <- spatial.tests[[1]]$statistic
      spatial.tests.dom <- spatial.tests[[2]]$statistic
      prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
      cat("-")
    } else {
      trait_1_speciation <- NA
      trait_2_speciation <- NA
      trait_1_extinction <- NA
      trait_2_extinction <- NA
      transition_from_trait_1_to_2 <- NA
      transition_from_trait_2_to_1 <- NA
      transition_rate_ratio_1to2_over_2to1 <- NA
      Phylogenetic_signal <- NA
      spatial.tests.fora <- NA
      spatial.tests.dom <- NA
      prevalence <- ifelse(names(table(this_world[, 6])[1]) == "1", 1,
                           -1)
      cat("---")

    }




    results_summary_matrix_1 <- cbind(

      number_of_branches,
      Pylo_diversity_is_sum_of_BL,
      average_phylogenetic_diversity_is_mean_of_BL,
      variance_Pylo_diversity_is_variance_of_BL,

      F_quadratic_entropy_is_sum_of_PD,
      Mean_pairwise_distance,
      variance_pairwise_distance,

      colless_stat ,
      sackin_index ,
      tree_shape_stat,

      Evolutionary_distinctiveness_sum,
      mean_Phylogenetic_isolation,
      variance_Phylogenetic_isolation,

      gamma,
      gamma_p_value,
      speciation_rate,
      extinction_rate,
      extinction_per_speciation,
      speciation_minus_extinction,
      trait_1_speciation,
      trait_2_speciation ,
      trait_1_extinction ,
      trait_2_extinction ,
      transition_from_trait_1_to_2 ,
      transition_from_trait_2_to_1 ,
      transition_rate_ratio_1to2_over_2to1 ,
      Phylogenetic_signal,
      spatial.tests.fora,
      spatial.tests.dom,
      prevalence
    )
    rownames(results_summary_matrix_1) <- 1

    results_summary_matrix_2 <- cbind(
      c(Evolutionary_distinctiveness,NA),
      lineages_through_time,
      time_steps
    )
    colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness",
                                            "lineages_through_time", "time_steps")
    head(results_summary_matrix_2)

    ### Returns from function in list form
    returns <- list(
      #Branch_Lengths,
      #Pairwise_dist,
      results_summary_matrix_1,
      results_summary_matrix_2

    )

    names(returns) <- c(
      #"Branch_Lengths",
      #"Pairwise_distance",
      "results_summary_of_single_value_outputs",
      "results_summary_matrix_of_multi_value_outputs"
    )
    cat("] 100%")

    return(returns)

  }
}


#Module_2(myOut)

```


## References



<!--chapter:end:Module_2_markdown.Rmd-->

---
title: "D-place FARM documentation: Module 3 through time"
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
output:
  html_notebook: default
  html_document: default
  pdf_document: default
  word_document: default
bibliography: FARM package.bib
---


```{r}
setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output")

details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data




```


```{r}
str(fit)
```


```{r}
fit$confusion
as.vector(fit$confusion)
as.vector(fit$test$confusion)

```


```{r}
fit$importance[,5]
fit$importanceSD[,5]
```



```{r}


str(fit)
fit$confusion
plot(fit$err.rate[,1])
lines(fit$err.rate[,2])
lines(fit$err.rate[,3])
lines(fit$err.rate[,4])
lines(fit$err.rate[,5])


```

<!--chapter:end:Module_3_across_subsetted_sample_sizes.Rmd-->

---
title: "D-place FARM documentation: Module 3"
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
output:
  html_notebook: default
  html_document: default
  pdf_document: default
  word_document: default
bibliography: FARM package.bib
---
```{r}
library(png)
```



```{r}
## First consolidate the available files into a single table
    
      path <- "~/Box Sync/Four model compare/Module 2"
           
     
           setwd(path)
    myfiles_full <- list.dirs()
    analyze_this_many <- length(myfiles_full)
    
    available_files <- matrix(NA, 1, 1)
    
        
    for(i in 1: analyze_this_many){
    available_files <- rbind(available_files , as.matrix(list.files(myfiles_full[i], full.names = TRUE)))
    }
    dim(available_files)
    
    split.file.name <- strsplit(available_files[10], split = "_") 
    
    
    
 
available <- list.files()
files <- matrix(rep(NA, 62), length(available), 62)
dim(files)
i <- 10


for(i in 1:length(available)){
load(available[i])
name <- unlist(strsplit(available[i], split="_"))
files[i,] <- c(as.vector(matrix(name, 1,35)),matrix(Sim_statistics[[1]], 1, 27))

}


colnames(files) <-  c(

	NA,
	"background_takeover_type" ,
	NA,
	"replicate",
	NA,
	"Model_type",
	rep(NA,2),
	"speciation_of_Env_NonD",
	"speciation_of_Env_D",
	"speciation_of_For",
	"speciation_of_Dom",
	NA,
	"extinction_of_Env_NonD",
	"extinction_of_Env_D",
	"extinction_of_For",
	"extinction_of_Dom",
	NA,
	"P.diffusion_Target_forager",
	"P.diffusion_Target_domesticator",
	"P.diffusion_Source_forager",
	"P.diffusion_Source_domesticator",
	NA,
	"P.takeover_Target_forager",
	"P.takeover_Target_domesticator",
	"P.takeover_Source_forager",
	"P.takeover_Source_domesticator",
	NA,
	"arisal_of_Env_NonD",
	"arisal_of_Env_D",
	"arisal_of_For",
	"arisal_of_Dom",
	
	NA, 
	"timesteps", 
	NA,
        
    "number_of_branches",
	"Pylo_diversity_is_sum_of_BL",
	"average_phylogenetic_diversity_is_mean_of_BL",
	"variance_Pylo_diversity_is_variance_of_BL",

	"F_quadratic_entropy_is_sum_of_PD",
	"Mean_pairwise_distance",
	"variance_pairwise_distance",

	"Evolutionary_distinctiveness_sum",
	"mean_Phylogenetic_isolation",
	"variance_Phylogenetic_isolation",

	"gamma",
	"gamma_p_value",
	"speciation_rate",
	"extinction_rate",
	"extinction_per_speciation",
	"speciation_minus_extinction",
	"trait_1_speciation",
  	"trait_2_speciation" ,
  	"trait_1_extinction" ,
  	"trait_2_extinction" ,
  	"transition_from_trait_1_to_2" ,
  	"transition_from_trait_2_to_1" ,
  	"transition_rate_ratio_1to2_over_2to1" ,
  	"Phylogenetic_signal",
  	"spatial.tests.fora",
  	"spatial.tests.dom",
  	"prevalence"
  	
    
  )

results_table <- as.data.frame(files)
head(results_table)
dim(results_table)
Concatenated_data <- results_table
save(Concatenated_data, file="~/Desktop/Four_model_compare_results.Rdata")

one <- subset(results_table, Model_type=="01" )
two <- subset(results_table, Model_type=="02" )
three <- subset(results_table, Model_type=="03" )
four <- subset(results_table, Model_type=="04" )
crop <- min(length(one[,1]),
length(two[,1]),
length(three[,1]),
length(four[,1]))
one <- one[1:crop,]
two <- two[1:crop,]
three <- three[1:crop,]
four <- four[1:crop,]

Concatenated_data <- rbind(one, two, three, four)
dim(Concatenated_data)





save(Concatenated_data, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries/Four_model_compare_results", format(Sys.time(), format="%d_%b_%Y"),"_crop_to_", crop,".Rdata"))
crop

```




```{r}
## First consolidate the available files into a single table
    
      path <- "~/Box Sync/Four model compare/Module 2 extinct"
           
     
           setwd(path)
    myfiles_full <- list.dirs()
    analyze_this_many <- length(myfiles_full)
    
    available_files <- matrix(NA, 1, 1)
    
        
    for(i in 1: analyze_this_many){
    available_files <- rbind(available_files , as.matrix(list.files(myfiles_full[i], full.names = TRUE)))
    }
    dim(available_files)
    
    split.file.name <- strsplit(available_files[10], split = "_") 
    
    
    
 
available <- list.files()
files <- matrix(rep(NA, 62), length(available), 62)
dim(files)
i <- 10


for(i in 1:length(available)){
load(available[i])
name <- unlist(strsplit(available[i], split="_"))
files[i,] <- c(as.vector(matrix(name, 1,35)),matrix(Sim_statistics[[1]], 1, 27))

}


colnames(files) <-  c(

	NA,
	"background_takeover_type" ,
	NA,
	"replicate",
	NA,
	"Model_type",
	rep(NA,2),
	"speciation_of_Env_NonD",
	"speciation_of_Env_D",
	"speciation_of_For",
	"speciation_of_Dom",
	NA,
	"extinction_of_Env_NonD",
	"extinction_of_Env_D",
	"extinction_of_For",
	"extinction_of_Dom",
	NA,
	"P.diffusion_Target_forager",
	"P.diffusion_Target_domesticator",
	"P.diffusion_Source_forager",
	"P.diffusion_Source_domesticator",
	NA,
	"P.takeover_Target_forager",
	"P.takeover_Target_domesticator",
	"P.takeover_Source_forager",
	"P.takeover_Source_domesticator",
	NA,
	"arisal_of_Env_NonD",
	"arisal_of_Env_D",
	"arisal_of_For",
	"arisal_of_Dom",
	
	NA, 
	"timesteps", 
	NA,
        
    "number_of_branches",
	"Pylo_diversity_is_sum_of_BL",
	"average_phylogenetic_diversity_is_mean_of_BL",
	"variance_Pylo_diversity_is_variance_of_BL",

	"F_quadratic_entropy_is_sum_of_PD",
	"Mean_pairwise_distance",
	"variance_pairwise_distance",

	"Evolutionary_distinctiveness_sum",
	"mean_Phylogenetic_isolation",
	"variance_Phylogenetic_isolation",

	"gamma",
	"gamma_p_value",
	"speciation_rate",
	"extinction_rate",
	"extinction_per_speciation",
	"speciation_minus_extinction",
	"trait_1_speciation",
  	"trait_2_speciation" ,
  	"trait_1_extinction" ,
  	"trait_2_extinction" ,
  	"transition_from_trait_1_to_2" ,
  	"transition_from_trait_2_to_1" ,
  	"transition_rate_ratio_1to2_over_2to1" ,
  	"Phylogenetic_signal",
  	"spatial.tests.fora",
  	"spatial.tests.dom",
  	"prevalence"
  	
    
  )

Concatenated_data <- as.data.frame(files)
head(Concatenated_data)
dim(Concatenated_data)

save(Concatenated_data, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries/Four_model_compare_results_extinct_", format(Sys.time(), format="%d_%b_%Y"),"_crop_to_", crop,".Rdata"))

```


```{r}
load('~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries/Four_model_compare_results_02_Mar_2017_crop_to_3481.Rdata')
extant <- Concatenated_data
extant
```




```{r}
setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries")
details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data

trimmed_details <- details[which(list.files() != list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extant <- Concatenated_data





```

```{r}
dim(extinct)
dim(extant)
```




```{r}

for(i in c(9,10,11,12,14,15,16,17,19,20,21,22,24,25,26,27,29,30,31,32)){
	extinct[which(is.nan(as.numeric(as.character(extinct[, i]))) == TRUE), i] <- NA
}

for(i in c(9,10,11,12,14,15,16,17,19,20,21,22,24,25,26,27,29,30,31,32)){
	extant[which(is.nan(as.numeric(as.character(extant[, i]))) == TRUE), i] <- NA
}

i <- 19
for(i in c(20,21,24,25,26,27)){
	extinct[which(as.numeric(as.character(extinct[, i])) == 0), i] <- NA
}

for(i in c(20,21,24,25,26,27)){
	extant[which(as.numeric(as.character(extant[, i])) == 0), i] <- NA
}


xlimit <- c(0,1)
ylimit <- c(0,600)
maincex <- 0.9

png(file="Global_success_rate_per_parameter.png", width=8.5, height=11, units="in", res=300)

par(mfrow=c(5,4), mar=c(3,3,3,0))


hist(as.numeric(as.character(extinct[,9])), main="speciation of F in F env", col=adjustcolor("firebrick", alpha=0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,9])), main="speciation of F in F env", col=adjustcolor("cornflowerblue", alpha=0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)


hist(as.numeric(as.character(extinct[,10])), main="speciation of D in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,10])), main="speciation of D in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)


hist(as.numeric(as.character(extinct[,11])), main="speciation of F in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,11])), main="speciation of F in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[,12])), main="speciation of D in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[,12])), main="speciation of D in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

#######

hist(as.numeric(as.character(extinct[, 14])), main="extinction of F in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 14])), main="extinction of F in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 15])), main="extinction of D in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 15])), main="extinction of D in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 16])), main="extinction of F in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 16])), main="extinction of F in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 17])), main="extinction of D in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 17])), main="extinction of D in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

######

hist(as.numeric(as.character(extinct[, 29])), main="arisal of F in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 29])), main="arisal of F in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 30])), main="arisal of D in F env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 30])), main="arisal of D in F env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 31])), main="arisal of F in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 31])), main="arisal of F in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 32])), main="arisal of D in D env", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 32])), main="arisal of D in D env", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

######

hist(as.numeric(as.character(extinct[, 19])), main="NOPE -- Diffusion: source F, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex)
hist(as.numeric(as.character(extant[, 19])), main="NOPE -- Diffusion: source F, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 20])), main="Diffusion: source D, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 20])), main="Diffusion: source D, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 21])), main="Diffusion: source F, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 21])), main="Diffusion: source F, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 22])), main="NOPE -- Diffusion: source D, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex)
hist(as.numeric(as.character(extant[, 22])), main="NOPE -- Diffusion: source D, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= c(0,18000), cex.main= maincex, add=TRUE)

####

hist(as.numeric(as.character(extinct[, 24])), main="Takeover: source F, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 24])), main="Takeover: source F, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)



hist(as.numeric(as.character(extinct[, 25])), main="Takeover: source D, target F", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 25])), main="Takeover: source D, target F", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 26])), main="Takeover: source F, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 26])), main="Takeover: source F, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)

hist(as.numeric(as.character(extinct[, 27])), main="Takeover: source D, target D", col=adjustcolor("firebrick", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex)
hist(as.numeric(as.character(extant[, 27])), main="Takeover: source D, target D", col=adjustcolor("cornflowerblue", alpha= 0.7), breaks=100, border=NA, xlim= xlimit, ylim= ylimit, cex.main= maincex, add=TRUE)


dev.off()






```


![](Global_success_rate_per_parameter.png)


```{r}



png(file="extiction minus extant per outcome.png", width=8.5, height=11, units="in", res=300)
par(mfrow=c(3,1))

plot(as.numeric(as.character(extinct[,9])), as.numeric(as.character(extinct[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("firebrick", alpha=0.2), pch=19, cex=0.6, ylim=c(0,1))
plot(as.numeric(as.character(extant[,9])), as.numeric(as.character(extant[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("cornflowerblue", alpha=0.2), pch=19, cex=0.6, ylim=c(0,1))

plot(as.numeric(as.character(extinct[,9])), as.numeric(as.character(extinct[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("firebrick", alpha=0.2), pch=19, cex=0.6, ylim=c(0,1))
points(as.numeric(as.character(extant[,9])), as.numeric(as.character(extant[,14])), xlab="speciation", ylab="extinction", col= adjustcolor("cornflowerblue", alpha=0.2), pch=19, cex=0.6)


dev.off()


```

![](extiction minus extant per outcome.png)



```{r}

params <- extant[,-1:-35]
names(params)
break_number <- 10
xmin <-
xmax <- 

x1 <- as.numeric(params[,1])
h1 <- hist(x1,  plot=FALSE, breaks= break_number)
xfit1 <- seq(xmin, xmax, length= 100)
yfit1 <- dnorm(xfit1, mean=mean(x1), sd=sd(x1))
yfit1 <- yfit1*diff(h1$mids[1:2])*length(x1)+101.5
polygon(xfit1, yfit1, col=adjustcolor("limegreen", alpha=0.5), lwd=2, border=adjustcolor("limegreen", alpha=0.6))

```





```{r}

setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/available daily summaries")
details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data

trimmed_details <- details[which(list.files() != list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extant <- Concatenated_data



load('~/Box Sync/colliding ranges/Simulations_humans/Available trees/real.analysis.RData')

Concatenated_data <- extant
head(Concatenated_data)
#Concatenated_data <- Concatenated_data[Concatenated_data[, 2] == "stats.no.bTO", ]
#Concatenated_data <- Concatenated_data[Concatenated_data[, 6] != "05", ]
# Concatenated_data[, 6] <- as.numeric(Concatenated_data[, 6])
# # Concatenated_data[original[, 2] == "background_takeover", 6] <-  Concatenated_data[original[, 2] == "background_takeover", 6] + 4
Concatenated_data[, 6] <- factor(Concatenated_data[, 6])
#head(Concatenated_data)
#names(Concatenated_data)

PCAdata <- Concatenated_data[, -(1:35)]
PCAdata <- PCAdata[, -12]
PCAdata <- apply(PCAdata, 2, as.numeric)
remove <- apply(is.na(PCAdata), 1, any)
PCAdata <- PCAdata[!remove, ]

# Predictions
library(randomForest)

data.analysis.comp2 <- data.frame("Model" = as.factor(Concatenated_data[!remove, 6]),
                                  PCAdata)
data.analysis.comp2$sprate <- data.analysis.comp2$trait_1_speciation/data.analysis.comp2$trait_2_speciation
data.analysis.comp2$extrate <- data.analysis.comp2$trait_1_extinction/data.analysis.comp2$trait_2_extinction


#load("Real_phy/real.analysis.RData")
a <- as.data.frame(real.analysis$results_summary_of_single_value_outputs)
a$sprate <- a$trait_1_speciation / a$trait_2_speciation
a$extrate <- a$trait_1_extinction / a$trait_2_extinction

data.analysis.comp3 <- data.analysis.comp2[, -c(2, 13:14, 16:20)]
#data.analysis.comp3 <- data.analysis.comp3[data.analysis.comp3$Model %in% 1:4, ]
#data.analysis.comp3$Model <- factor(data.analysis.comp3$Model)
#sub <- unlist(lapply(as.list(c(1:4)), function(x, y) {
#  sample(which(y$Model == x), min(table(data.analysis.comp3$Model)))},
#  y = data.analysis.comp3))
# data.analysis.comp3 <- data.analysis.comp3[sub, ]
fun <- function(x, y, per = .33) {sample(which(y$Model == x), round(table(y$Model)[1]*per))}

sub.test <- unlist(lapply(as.list(paste0(0, c(1:4))), fun,
                          y = data.analysis.comp3))
test2 <- data.analysis.comp3[sub.test, 2:ncol(data.analysis.comp3)]
test1 <- data.analysis.comp3[sub.test, 1]
train <- data.analysis.comp3[-sub.test, ]

train[, -1] <- apply(train[, -1], 2, as.numeric)
test2 <- as.data.frame(apply(test2, 2, as.numeric))
infinites <- which(apply(train[, -1], 2, is.infinite), arr.ind=T)
if (nrow(infinites) > 0) {
train <- train[-infinites[, 1], ]
}
infinites2 <- which(apply(test2, 2, is.infinite), arr.ind=T)
if (nrow(infinites2) > 0) {
test2 <- test2[-infinites2[, 1], ]
test1 <- test1[-infinites2[, 1]]
}


for(i in 1:100){
(fit <- randomForest(Model ~ ., data=train, xtest = test2, ytest = test1, 
                    importance=TRUE, ntree=1000, keep.forest = TRUE, replace=FALSE))

predictions <- predict(fit, 
                       a,
                       type="prob")
predictions

save(fit, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output/RF_daily_output_", format(Sys.time(), format="%d_%b_%Y"), "_",i, "_NoREPLACEMENT_.Rdata"))
}
```

```{r}


save(fit, file=paste0("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output/RF_daily_output_", format(Sys.time(), format="%d_%b_%Y"),".Rdata"))
```




```{r}

plot(fit, ylim=c(0,1))

```




```{r}
labs <- c("Basic", "+Diffusion", "+Takeover", "+Diffusion +Takeover")

```


```{r}
# bar plot
png("Prob_aus.png", width = 25, height = 25, res = 300, units = "in")
par(mar = c(8, 8, 1, 1))
pred <- setNames(as.numeric(predictions), labs)
cols <- rev(c("darkgreen", "red", "blue", "darkorange1"))
barplot(pred, col = cols, ylab = "Proability", cex.lab = 3, cex.names = 2)
dev.off()
```

![](Prob_aus.png)


```{r}
# Plot confusion matrix
png("Conffusion_matrix_all.png", width = 25, height = 25, res=300, units="in")
par(mar = c(10, 11, 1, 1))
colors1 <- colorRampPalette(colors = c("#f0f0f0", "#bdbdbd","#636363"))
prop <- apply(fit$confusion[, -5], 2, function(x){x / sum(x)}) * 100

image(prop, col = colors1(20), axes=FALSE)
axis(1, at=c(0, .33, .66, 1), labels=labs, tick = FALSE, line = FALSE, cex.axis = 3.5, pos = -.19)
axis(2, at=c(0, .33, .66, 1), labels=labs, tick = FALSE, line = FALSE, cex.axis = 3.5)
mtext("ACTUAL", side = 1, padj = 3, cex = 4)
mtext("PREDICTED", side = 2, padj = -3, cex = 4)

for(i in 1:4) {
  for(j in 1:4) {
    text(x = c(0, .33, .66, 1)[i], y = c(0, .33, .66, 1)[j], paste0(round(prop[i, j], 2), "%"),
         cex = 5)
  }
}
dev.off()
```


![](Conffusion_matrix_all.png)


```{r}
importance(fit)
```


```{r}
# Variables importance

imp <- importance(fit)
imp <- apply(imp, 2, function(x) (x - min(x))/(max(x) - min(x)))
imp <- imp[sort(imp[, 5], index.return = TRUE, decreasing = TRUE)$ix, ]


names <- rownames(imp)
names[names == "spatial.tests.fora"] <- "Space F"
names[names == "spatial.tests.dom"] <- "Space D"
names[names == "sprate"] <- "Sp(ratio)"
names[names == "transition_from_trait_1_to_2"] <- "TR(1-2)"
names[names == "transition_from_trait_2_to_1"] <- "TR(2-1)"
names[names == "Phylogenetic_signal"] <- "PhySig(D)"
names[names == "Evolutionary_distinctiveness_sum"] <- "EDsum"
names[names == "Pylo_diversity_is_sum_of_BL"] <- "PDsum"
names[names == "transition_rate_ratio_1to2_over_2to1"] <- "TR(ratio)"
names[names == "gamma"] <- "Gamma"
names[names == "mean_Phylogenetic_isolation"] <- "MPI"
names[names == "extrate"] <- "Ext(ratio)"
names[names == "average_phylogenetic_diversity_is_mean_of_BL"] <- "PDmean"
names[names == "extinction_per_speciation"] <- "DR"
names[names == "variance_Phylogenetic_isolation"] <- "VPI"
names[names == "F_quadratic_entropy_is_sum_of_PD"] <- "F"
names[names == "Mean_pairwise_distance"] <- "MPD"
names[names == "variance_Pylo_diversity_is_variance_of_BL"] <- "PDvar"
names[names == "variance_pairwise_distance"] <- "VPD"


png("var_import_all.png", width = 25, height = 25, unit="in", res=300)
par(mar = c(10, 18, 1, 1))
plot(x = rev(imp[, 5]), y = 1:nrow(imp), type = "l", yaxt = "n", 
     ylab = "", xlab = "Variable Importance",
     xlim = c(0, 1), lwd = 2, cex.lab = 4)
for (i in 1:nrow(imp)) {
  abline(h = i, lty = 3, col = "gray80")
}
abline(v = seq(0, 1, 1/19), lty = 3, col = "gray80")

lines(x = rev(imp[, 4]), y = 1:nrow(imp), col = "darkgreen", lwd = 2)
lines(x = rev(imp[, 3]), y = 1:nrow(imp), col = "red", lwd = 2)
lines(x = rev(imp[, 2]), y = 1:nrow(imp), col = "blue", lwd = 2)
lines(x = rev(imp[, 1]), y = 1:nrow(imp), col = "darkorange1", lwd = 2)
lines(x = rev(imp[, 5]), y = 1:nrow(imp), lwd = 3)

points(x = rev(imp[, 4]), y = 1:nrow(imp), col = "darkgreen", cex = 2)
points(x = rev(imp[, 3]), y = 1:nrow(imp), col = "red", cex = 2)
points(x = rev(imp[, 2]), y = 1:nrow(imp), col = "blue", cex = 2)
points(x = rev(imp[, 1]), y = 1:nrow(imp), col = "darkorange1", cex = 2)
points(x = rev(imp[, 5]), y = 1:nrow(imp), pch = 20, cex = 3)


text(y = 1:nrow(imp), x = par("usr")[1] - .17, labels = rev(names),
     srt = 0, pos = 4, xpd = T, cex = 4)
dev.off()
```

![](var_import_all.png)




```{r}
par(mfrow=c(2,3))

# Box plots
boxplot(spatial.tests.fora ~ Model, data = data.analysis.comp3)
abline(h = a$spatial.tests.fora, col = "red", lty = 2)

boxplot(spatial.tests.dom ~ Model, data = data.analysis.comp3)
abline(h = a$spatial.tests.fora, col = "red", lty = 2)

boxplot(log(sprate) ~ Model, data = data.analysis.comp3, ylim = c(-10, 10))
abline(h = log(a$sprate), col = "red", lty = 2)

boxplot(log(extrate) ~ Model, data = data.analysis.comp3, ylim = c(-10, 10))
abline(h = log(a$extrate), col = "red", lty = 2)

boxplot(log(transition_rate_ratio_1to2_over_2to1) ~ Model, data = data.analysis.comp3)
abline(h = log(a$sprate), col = "red", lty = 2)

boxplot(Phylogenetic_signal ~ Model, data = data.analysis.comp3, ylim = c(0, 1))
abline(h = a$Phylogenetic_signal, col = "red", lty = 2)


```



```{r}
#build a data tracking table to track parameter changes through time

str(fit)

y

```









































<!--chapter:end:Module_3_markdown.Rmd-->

---
title: "D-place FARM documentation: Module 3 through time"
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
output:
  html_notebook: default
  html_document: default
  pdf_document: default
  word_document: default
bibliography: FARM package.bib
---


```{r}
setwd("~/Box Sync/colliding ranges/Simulations_humans/Results/RF_daily_output")

details <- file.info(list.files())

trimmed_details <- details[which(list.files() == list.files(pattern = "Four_model_compare_results_extinct*")),]
ord <- order(trimmed_details$mtime, decreasing = TRUE)
rownames(trimmed_details[ord,])[1]
load(rownames(trimmed_details[ord,])[1])
extinct <- Concatenated_data




```


```{r}
str(fit)
```


```{r}
fit$confusion
as.vector(fit$confusion)
as.vector(fit$test$confusion)

```


```{r}
fit$importance[,5]
fit$importanceSD[,5]
```



```{r}


str(fit)
fit$confusion
plot(fit$err.rate[,1])
lines(fit$err.rate[,2])
lines(fit$err.rate[,3])
lines(fit$err.rate[,4])
lines(fit$err.rate[,5])


```

<!--chapter:end:Module_3_through_time.Rmd-->


---
title: "Tree and space analysis function"
author: "Ty Tuff and Bruno Vilela"
date: '`r strftime(Sys.time(), format = "%B %d, %Y")`'
output: html_document
---




```{r }
load('~/Downloads/FULL_TREE_Society_data_with_binary_conversions.Rdata')

load('~/Downloads/Tree_FULL_trimmed.Rdata')

this_tree <- full_tree 


load('~/Downloads/download.Rdata')
objects()
str(myOut)

this_tree <- myOut$mytree
this_world <- myOut$myWorld
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r}

stop <- function(){


library(ape)
library(phytools)
library(diversitree)
library(spdep)
library(Rcpp)
library(msm)
library(FARM)
options(repos = c(CRAN = "http://cran.rstudio.com"))
install.packages("devtools")
library(devtools)
install_github("BrunoVilela/FARM")
library(FARM)

load('~/Downloads/FULL_TREE_Society_data_with_binary_conversions.Rdata')

load('~/Downloads/Tree_FULL_trimmed.Rdata')

this_tree <- full_tree 


load('~/Downloads/download.Rdata')
objects()
str(myOut)


this_world <- myOut$myWorld

## This module analyzes the results from module 1 and returns a list based on how many values each stat returns
## Ty Tuff and Bruno Vilela
## 24 August 2016

###### Specify function ##############################



    #  this_tree <- Module_1_output$mytree
    #  this_world <- Module_1_output$myWorld


      ##### (0) Pull necessary variables from simulated trees and organize into a single object for all the tests below to pull from.

      #str(all_trees)
      #str(this_tree)


      ## 0a) Branch lengths
      Branch_Lengths <- this_tree$edge.length
      number_of_branches <- length(Branch_Lengths)

      # Anchor test = PD (Faith's phylogenetic diversity)
      Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)

      # avPD -- Average phylogenetic diversity
      average_phylogenetic_diversity_is_mean_of_BL <- mean(Branch_Lengths)

      variance_Pylo_diversity_is_variance_of_BL <- var(Branch_Lengths)
      cat("-")
      ## 0b) Pairwise distance between tips
      Pairwise_dist <- cophenetic(this_tree)
      cat("-")
      # 2b) Pairwise distance -- Sum of pairwise distances

      # F -- Extensive quadratic entropy
      F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)

      #Mean inter-species distances

      # Anchor test = MPD (mean pairwise distance)

      Mean_pairwise_distance <- mean(Pairwise_dist)

      cat("-")
      #Pairwise distance/all distances -- Variance of pairwise distances

      # Anchor test = VPD (variation of pairwise distance)

      variance_pairwise_distance <- var(as.vector(Pairwise_dist))




      ## 0c) Phylogenetic isolation

      # Using equal.splits method, faster computation
      Evolutionary_distinctiveness <- evol.distinct2(this_tree, type = "equal.splits")
      cat("-")
      # ED - Summed evolutionary distinctiveness

      Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness)

      ## 3d) Phylogenetic isolation -- Mean of species evolutionary distinctiveness

      # mean(ED)

      mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness)

      ## 4d) Phylogenetic isolation -- Variance of species isolation metrics

      #var(ED)

      variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness)
      cat("-")

      ## Tree topology

      #Gamma index

      ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
      lineages_through_time <- as.numeric(ltts[[1]])
      time_steps <- as.numeric(ltts[[2]])
      gamma <- ltts[[3]]
      gamma_p_value <- ltts[[4]]
      cat("-")

      ##### (5) Tree metric -- Macroevolutionary - Rate and rate changes ###############
      ##################################################

      ## Speciation vs extinction rates and Net diversification
      bds <- bd(this_tree)
      speciation_rate <- bds[1]
      extinction_rate <- bds[2]
      extinction_per_speciation <- bds[3]
      speciation_minus_extinction <- bds[4]
      cat("-")


      ## Speciation vs extinction rates and Net diversification dependent on trait
      N.for.dom <- table(this_world[, 6])
      if(length(N.for.dom) == 2) {
        par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
        trait_1_speciation <- par.div.dep[1]
        trait_2_speciation <- par.div.dep[2]
        trait_1_extinction <- par.div.dep[3]
        trait_2_extinction <- par.div.dep[4]
        transition_from_trait_1_to_2 <- par.div.dep[5]
        transition_from_trait_2_to_1 <- par.div.dep[6]
        transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
        cat("-")


        ## Crown age per trait AUC and effect size
        tip.length <- this_tree$edge.length[this_tree$edge[, 2] %in% 1:Ntip(this_tree)]
        tip.length <- (tip.length - min(tip.length)) / (max(tip.length) - min(tip.length))
        this_trait <- this_world[match(this_tree$tip.label, this_world[, 8]), 6]
        tip.length.2 <- tip.length[this_trait == 2]
        tip.length.1 <- tip.length[this_trait == 1]
        model <- glm(as.factor(this_trait) ~ log(tip.length + 1),
                     family = "binomial")
        effect.size <- model$coefficients[2]
        plot(y = this_trait - 1, x= log(tip.length))
        p <- predict(model, as.factor(this_trait), type = "resp")
        points(y = p, x = log(tip.length), col = "red")
        pr <- prediction(p, as.factor(this_trait))
        auc.model <- performance(pr, measure = "auc")@y.values[[1]]

        ## Phylogenetic signal (D)
        Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)
        cat("-")

        ## Spatial Analysis
        nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
        nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
        nbs.listw <- nb2listw(nbs)
        factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
        spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
        spatial.tests.fora <- spatial.tests[[1]]$statistic
        spatial.tests.dom <- spatial.tests[[2]]$statistic
        prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
        cat("-")
      } else {
        trait_1_speciation <- NA
        trait_2_speciation <- NA
        trait_1_extinction <- NA
        trait_2_extinction <- NA
        transition_from_trait_1_to_2 <- NA
        transition_from_trait_2_to_1 <- NA
        transition_rate_ratio_1to2_over_2to1 <- NA
        Phylogenetic_signal <- NA
        spatial.tests.fora <- NA
        spatial.tests.dom <- NA
        auc.model <- NA
        effect.size <- NA
        prevalence <- ifelse(names(table(this_world[, 6])[1]) == "1", 1,
                             -1)
        cat("---")

      }




      results_summary_matrix_1 <- cbind(

        number_of_branches,
        Pylo_diversity_is_sum_of_BL,
        average_phylogenetic_diversity_is_mean_of_BL,
        variance_Pylo_diversity_is_variance_of_BL,

        F_quadratic_entropy_is_sum_of_PD,
        Mean_pairwise_distance,
        variance_pairwise_distance,

        Evolutionary_distinctiveness_sum,
        mean_Phylogenetic_isolation,
        variance_Phylogenetic_isolation,

        gamma,
        gamma_p_value,
        speciation_rate,
        extinction_rate,
        extinction_per_speciation,
        speciation_minus_extinction,
        trait_1_speciation,
        trait_2_speciation ,
        trait_1_extinction ,
        trait_2_extinction ,
        transition_from_trait_1_to_2 ,
        transition_from_trait_2_to_1 ,
        transition_rate_ratio_1to2_over_2to1 ,
        Phylogenetic_signal,
        spatial.tests.fora,
        spatial.tests.dom,
        prevalence,
        auc.model,
        effect.size
      )
      rownames(results_summary_matrix_1) <- 1

      results_summary_matrix_2 <- cbind(
        c(Evolutionary_distinctiveness,NA),
        lineages_through_time,
        time_steps
      )
      colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness", "lineages_through_time", "time_steps")
     # head(results_summary_matrix_2)

      ### Returns from function in list form
      returns <- list(
        #Branch_Lengths,
        #Pairwise_dist,
        results_summary_matrix_1,
        results_summary_matrix_2

      )

      names(returns) <- c(
        #"Branch_Lengths",
        #"Pairwise_distance",
        "results_summary_of_single_value_outputs",
        "results_summary_matrix_of_multi_value_outputs"
      )
      cat("] 100%")

      

    }
  


#Module_2(myOut)

```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

<!--chapter:end:Tree_and_space_analysis_function.Rmd-->

---
title: "D-place FARM documentation"
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), format
  = "%d %B %Y")`'
bibliography: FARM package.bib  
output:
  html_notebook: default
  html_document: default
  pdf_document: default
  word_document: default

---

@Nielsen

# Module 2: Space and phylogeny summary statistics 
  This is the master document for Module 2, a foundational function in our FARM package that analyzes results from Module 1, the other foundational function. Module 1 simulates a spatial pattern and a phylogenetic tree given a set of environmental and inheritance rules and then Module 2 summarizes those simulated results using a large set of targeted summary statistics. Here we describe our choice of summary statistics, justify those choices as part of a larger theoretical context, and provide our reproducable code for executing the anaylses yourself. These two parts are seperated into modules so that they can act independently. An combination of spatial pattern and associated phylogeny many be used as long as they are formatted correctly.         

This pipeline was designed to analyze a simulated world where all the information is known about both the world and the tree. There is no missing information, just extinct trees. This is much different than our real tree that has loads of uncertainty unevenly destributed across it. The result you see demonstrated right now are one simulated result of many. I need to do a sister page to this were we do this entire analysis on the real tree, or best real tree we've got. 

We have four types of data available for asking research questions using D-place data: [phylogenies](#phylogenetic-summary-statistics), [spatial locations](#spatial-locations), trait identities, and environmental reconstructions. Any one of these four data types alone are relatively information poor, so we are searching for ways to model connections between these data types to draw stronger conclusions overall. 

Other modules can use the summary statistics generated from this module to test hypotheses. We currently have a ABC and Random Forest module started but there will be more to come. 

These are quantitative connections that we are assumed in the analyses, but we don't actually have any support in the data for doing so. 
1. nearest neighbor connectivity measures
2. Abundance estimates
3. Pairwise influence (history) between cultures. 
4. Environmental reconstruction validation evidence 

# Phylogenetic summary statistics  

Whole tree vs. part of tree? These statistics are generally used to compare one sample to another. For example, an experimental contrast between two sites, two phylogenetic groups, or two communities in two different locations. Here we are calculating these statistics for the global langauage tree to compare against global trees created in our simulation. You still retain the ability to subset this tree or others and send only those subsets through this code to compare the values with each other afterwards. 

* Introduction and framework
* [Alpha Diversity metrics](#alpha-diversity-metrics) 
  1. [Branch Length (richness and divergence)](#branch-lengths)
  2. [Pairwise distance between tips (richness, divergence, and regularity)](#pairwise-distance-between-tips)
  3. [Phylogenetic isolation (divergence, and regularity)](#phylogenetic-isolation)
  4. [Nearest Neighbor (divergence, and regularity)](#nearest-phylogenetic-neighbor) 
* [Beta Diversity](#beta-diversity)
* [Tree topology](#tree-topology)
* [Macroevolutionary rates](#macroevolutionary-rates)


All trees are ultrametric.



## Introduction and framework


  The choice of phylogenetic analyses and organizational scheme is based on the suggestions of Tucker et al. 2016. Here are a few images from that paper for an overview:  


![From Tucker et al. 2016. Biological Reviews](Images/Tucker_2016/tree_concept.jpg)
![From Tucker et al. 2016. Biological Reviews](Images/Tucker_2016/alpha_PCA.jpg)

![From Tucker et al. 2016. Biological Reviews](Images/Tucker_2016/Three_type_summary.jpg)


![From Tucker et al. 2016. Biological Reviews](Images/Tucker_2016/tutorial.jpg)

```{r}
#load('~/Downloads/FULL_TREE_Society_data_with_binary_conversions.Rdata')

load('~/Downloads/Tree_FULL_trimmed.Rdata')

this_tree <- full_tree 


load('~/Downloads/download.Rdata')

#str(myOut)

this_tree <- myOut$mytree
this_world <- myOut$myWorld

#str(this_world)
  #str(all_trees)
  #str(this_tree)


#library(knitr)
#kable(this_world, caption= "This is our world")
```


##Alpha diversity metrics

### Branch Lengths
  Branch length data is embedded in the tree object provided to this function. The first step in summarizing the lengths is to extract those data from the tree object. These data are called 'edges' in the tree object. We extract branch lengths and create an object called 'Branch_lengths' for passing on to the other summary functions. The histogram below shows the frequency of different branch lengths found throughout the tree. 
  
```{r}
      Branch_Lengths <- this_tree$edge.length
```

```{r, echo=FALSE}
 # plot the branch lengths
      hist(Branch_Lengths, xlab="Branch Lengths", main="", breaks=1000, border=NA, col=adjustcolor("cornflowerblue", alpha=0.9))
```
We can summarize branch lengths according to normal summary statistics, but it can be difficult to assign evolutionary meaning to some of these metrics and so they are not regularly used as best I can tell. This lack of meaning does not mean that these statistics couldn't be used to distinguish between large simulated trees. 
```{r}
      mean_branch_length <- mean(Branch_Lengths)
      variance_branch_length <- var(Branch_Lengths)
      SD_branch_length <- sd(Branch_Lengths)
```


```{r, echo=FALSE}
paste("mean branch length = ", mean_branch_length)
paste("variance in branch lengths = ", variance_branch_length)
paste("standard deviation in branch lengths = ", SD_branch_length)

```

```{r, echo=FALSE}
par(mar=c(2,4,0,0))
boxplot(Branch_Lengths, col=adjustcolor("cornflowerblue", alpha=0.5), border=adjustcolor("cornflowerblue", alpha=1), ylab="Branch Lengths")
```

#### Phylogenetic diversity ($PD$)
Phylogenetic diversity ($PD$) is the summation ($\sum$) of all branch lengths connecting species together, where $B_{t}$ is the set of included tips and $L_{b}$ is Branch lengths. (Faith 1992) This is an anchor test, which means it is regularly used, well understood, and we should use it to anchor our work to past work. PD is a richness measure, it tells us how much evolutionary history is associated with a set of tips.  

$$PD = \sum_{b \in B_{t}}^{}L_{b}$$
 
```{r}

# Anchor test = PD (Faith's phylogenetic diversity)
      Pylo_diversity_is_sum_of_BL <- sum(Branch_Lengths)
      Pylo_diversity_is_sum_of_BL
```


 There are variations on this measure that we have NOT implemented here. It is popular to scale this measure according to some ecological driver. Barker (2002) scales branch lengths ($L_{b}$) by multiplying them against the abundance of individuals at at tip ($A_{b}$). Others (Rosauer et al. 2009), scale them by their range size instead ($R_{b}$). 
 
 $$\Delta n PD = \sum_{b \in B_{t}}^{}A_{b}L_{b}$$
$$PE = \sum_{b \in B_{t}}^{}\dfrac{L_{b}}{R_{b}}$$
Argueing that proportional abundance phylogenetic diversity ($PD_{Ab}$) is more effective than the standard PD calculated from raw abundance, Vellend et al. penned a new version of PD where $B$ is the total number of branch lengths ($L_{b}$). Note: We don't have abundance data right now for the human project so this metric is not currently very helpful.   
$$PD_{Ab} = B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}$$

```{r}
      #Calculate B
      number_of_branches <- length(Branch_Lengths)
      number_of_branches
```

#### Average phylogenetic diversity ($avPD$)
Average phylogenetic diversity ($avPD$) (introduced by Clarke and Warwick 2001) is a branch length-based divergence indices where PD is divided by the total number of tips ($S$) in the tree. 
$$avPD = \dfrac{PD}{S}$$
```{r}
      Number_of_tips <- length(this_tree$tip.label)
      average_phylogenetic_diversity <- Pylo_diversity_is_sum_of_BL / Number_of_tips
      average_phylogenetic_diversity
```


There is also a proportional abundance version of average phylogenetic diversity ($avPD_{Ab}$) (Tucker et al. 2016). Again, we don't have abundance values yet for D-place. 
$$avPD_{Ab} = \dfrac{B * \dfrac{\sum_{b \in B_{t}}^{}A_{b}L_{b}}{\sum_{b \in B_{t}}^{}A_{b}}}{S}$$


### Pairwise distance between tips

This is the patristic distance, the sum of the branch lengths following the shortest distance between two tips in a tree, implemented as a distance matrix where every tip is compared to every other tip. This distance function can be anything. We use euclidean and environmental distance matrices heavily in the spatial analyses. 

#### Calculate the patristic distance between two taxa, for all taxa
Calculate the patristic distance between two taxa using the R package 'phytools', this function takes a 'phylo' tree object and returns a distance matrix between tips. Need original citation.

```{r}
    library(phytools)
  
    ls("package:phytools")
 ## Pairwise distance between tips - From library(ape) in library(phytools)
    Pairwise_dist <- cophenetic(this_tree)

```
yields a distance matrix (list of 2D matrices) of all distances between taxa.


```{r, echo=FALSE}
str(Pairwise_dist)
```

#### Sum of all pairwise distances ($F$)
Now we can use a set of summary statistics to describe those pairwise distances. The sum of all pairwise distances, $F$, is formally called 'Extensive quadratic entropy'. (Izsak and Papp (2000) and Szeidl (2002)). Just as it was with branch lengths, this is a richness measure and, accordingly, should be used to answer richness questions. 

$$F = \sum_{i} \sum_{j} d_{ij}$$
```{r}
      # F -- Extensive quadratic entropy
      F_quadratic_entropy_is_sum_of_PD <- sum(Pairwise_dist)
```

#### Mean pairwise distance (MPD)
Mean inter-species distances. The mean of all pairwise distances, $MPD$ (a.k.a. $AvTD$, and $\Delta^{+}$), is the mean distance between species. (Clarke and Warwick (1998), Webb et al. (2002, 2008), Kembel et al. (2010))
$$MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}$$

```{r}
      # Anchor test = MPD (mean pairwise distance)

      Mean_pairwise_distance <- Pairwise_dist / (Number_of_tips * (Number_of_tips - 1) ) 
      
```

```{r, echo=FALSE}
str(Mean_pairwise_distance)
```

#### MPD anchored to the root 
There is an extention to mean pairwise distance calculations from Helmus et al. (2007) called $PSV$, $PSR$, and $PSE$, phylogenetic species variability, phylogenetic species richness, and phylogenetic species evenness. These measures take the basic pairwise distance calculations and anchor them to the root of the tree so distances have a common denominator. This extention is implemented by using the same equations, just with a constrained set of $d_{ij}$ conditions. Specifically,




$$PSV = MPD = \dfrac{\sum_{ij} d_{ij}}{S(S-1)}$$
$$PSR = \sum_{i} {(\dfrac{1}{S-1} \sum_{j} {d_{ij}})}$$
$$PSE = \dfrac{S}{S-1} \sum_{ij} d_{ij}p_{i}p_{j}$$


with these specific values of $d_{ij}$

$$d_{ji}=0.5*(c_{ii} + c_{jj} - c_{ij}) \\ 
\ \\
or \\
\ \\
d_{ij} = 1 - c_{ij} / (\sqrt{c_{ii}c_{jj}}) 
$$
and
$$
c_{ii} = the \ sum \ of \ branch \ lengths \ from \ tip \ i  \ to \ the \ root \ of \ the \ phylogenetic \ tree. \\
\ \\
c_{ij} = the \ sum \  of \ branch \ lengths \ from \ first \ common \ ancestor \ for \ i \ and \ j \ to \ the \ root.
$$



#### Average distance between two randomly chosen species
$J$, Intensive quadratic entropy, which is the average distance between two randomly chosen species. (Izsak and Papp 2000)
$$J = \dfrac{\sum_{ij}d_{ij}}{S^2} $$




#### Simpson's diversity index for pairwise distance
There has been a long effort to pen a phylogenetic analogy to a Simpson's diversity index. (see Rao (1982), Clarke and Warwick (1998), Pavoine et al. (2005), Hardy and Senterre (2007), Webb et al. (2002, 2008), Kembel et al. (2010)). The conclusion seems to be that this measure is equivilent to scaling $MPD$ by abundance $p_{i}$ and $p_{j}$ to get $MPD_{Ab}$. This is also a special case of Rao's Quadratic Entropy, $Roe's QE$. Note: not using abundance measures yet for D-place data. 

$$MPD_{Ab} = \sum_{i} \sum_{j} d_{ij} p_{i} p_{j}$$




#### Interspecific comparisons of pairwise distances
The interspecific variant (rather than the intraspecific default described above) defines the expected phylogenetic distance between two indivdiuals randomely drawn conditionally on the fact that they indivdulas from different species. 
$$InterMPD_{Ab} = \dfrac{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j} }{\sum_{i} \sum_{j \ne i} d_{ij} p_{i} p_{j}}  $$



#### Variance in pairwise distances ($VPD$)
Variance in pairwise distances, $VPD$ (a.k.a. $VarTD$ and $\Lambda^+$), is a regularity indices. Clarke and Warwick (2001) Variance is relative to tips, $S$, not to total branches ($B$ from above). These are the residuals, they compare each individual pairwise connection to the overall mean. 

$$VPD = \dfrac{1}{S(S-1)} (\sum_{i} \sum_{j \ne i} {(d_{ij} - MPD)^2})$$




```{r}
    #Pairwise distance/all distances -- Variance of pairwise distances

      # Anchor test = VPD (variation of pairwise distance) #need to adjust to equation above. 

      variance_pairwise_distance <- var(as.vector(Pairwise_dist))
```


Variants of $VPD$ are $VPD_{ab}$ and $InterPVD_{Ab}$, where variance is scaled by abundance or compared in and out of species. These are also regularity indices. 
$$
VPD_{Ab}= 
(\sum_{i} \sum_{j} n_{i} n_{j}) *
\dfrac{\sum_{i} \sum_{j} n_{i} n_{j} (d_{ij} - MPD_{Ab})^2}
{(\sum_{i} \sum_{j} n_{i} n_{j})^2 - \sum_{i} \sum_{j} (n_{i} n_{j})^2}
\\
or
\\
InterVPD_{Ab} = 
(\sum_{i} \sum_{j \ne i} n_{i} n_{j}) *
\dfrac{\sum_{i} \sum_{j \ne i} n_{i} n_{j} (d_{ij} - InterMPD_{Ab})^2}
{(\sum_{i} \sum_{j \ne i} n_{i} n_{j})^2 - \sum_{i} \sum_{j \ne i} (n_{i} n_{j})^2}
$$

### Nearest phylogenetic neighbor

#### Divergence indices
Divergence indices using nearest distance: $MNTD$ and $MNTD_{Ab}$, Mean nearest taxon distance and Abundance-weighted MNTD. (Webb et al. (2002,2008), Kembel et al. (2010)). 

$MNTD$, mean nearest taxon distance, is the mean shortest distance from a species to all other in the assemblage. Webb et al (2002, 2008), Kembel et al. (2010)  
$$
MNTD = 
\dfrac{1}{S}
\sum_{i}
d_{i_{min}}
$$

$MNTD_{AB}$, abundance adjusted mean nearest taxon distance. Adjusted by species proportions (i.e. species' relative abundances) Webb et al (2002, 2008), Kembel et al. (2010)   
$$
MNTD_{Ab} = 
\sum_{i=1}^{S}
[d_{i_{min}} * p_{i}]
$$

#### Regularity indices
Regularity indices using nearest distances: $VNTD$, $VNTD_{Ab}$, $PE_{ev}$.  

$VNTD$, Variance in nearest taxon distances, is the variance in nearest pairwise distance. Tucker et al. (2016)
$$
VNTD = \dfrac{1}{S}
\sum_{i-1}^{S}
[(d_{i_{min}} - MNTD)^2]
$$
$VNTD_{Ab}$, Abundance weighted variance in nearest taxon distances, is scales by abundance in the same way as descried above. Tucker et al. (2016)
$$
VNTD_{Ab} = \dfrac
{(\sum_{i} n_{i}) \sum_{i} n_{i} (d_{i_{min}} - MNTD_{Ab})^2}
{(\sum_{i} n_{i})^2 - \sum_{i} n_{i} ^2}
$$

#### Phylogenetic version of the funtional $FE_{ve}$ index
$PE_ve$, phylogenetic evenness is a phylogenetic version of the funtional $FE_{ve}$ index. First a minimu spannin tree (MST) is computed using the cophenetic distance obtained from the phylogenetic tree. The MST contains S-1 Branches connection the S species. We denote l a branch on the MST, dist(i,j) is the lengththe branch l that connects species i and j. ni is, as defined abouve, the abundce of species i in the asseblage. Villeger et al. (2008), Dehling et al. (2014).

$$
Weighted \ evenness: \\
EW_{i} = \dfrac{dist(i,j)}
{(n_{i} + n_{j})/(\sum_{k=1}^{S}n_{k})} \\
\ \\
Partial \ weighted \ evenness: \\ 
PEW_{l} = \dfrac
{EW_{l}}
{\sum_{l=1}^{S-1} EW_{l}} \\
\ \\
Phylogenetic \ evenness: \\
PE_{ve} = \dfrac
{\sum_{l=1}^{S-1} min(PEW_{l}, \dfrac{1}{S-1}) - (\dfrac{1}{S-1})}
{1- (\dfrac{1}{S-1})}
$$

### Phylogenetic isolation
A phylogenetic isolation index represents the relative isolation of a given species within a phylogenetic tree. Several indices have been proposed so far but we focus here on the evolutionary distinctiveness index called ‘Fair Proportion’ as proposed by Redding (2003) and Isaac (2007). 

#### Evolutionary distinctiveness (richness indices)
$ED$, evolutionary distinctiveness is a richness indices. NOTE: not equal to Faith's PD because the $ED_{i}$ are computed from the regional pool of species and sumed across a given assemblage (i.e. a subset of the regional species pool). Tucker (2016), Safi et al. (2013), Redding (2003), and Isaac (2007)

$$
  ED = \sum_{i}ED_{i} \\
  \ \\ 
  where \ ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}} 
$$

$AED$, Abundance-weighted $ED$. Tucker (2016) and Cadotte et al. (2010)
$$
\sum_{i} AED_{i} \\
\ \\
where \ AED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{A_{b} }* p_{i}
$$

```{r}

# Bruno's function for ED. Provided in library(FARM)

evol.distinct2 <- function (tree, type = c("equal.splits", "fair.proportion"), 
    scale = FALSE, use.branch.lengths = TRUE) 
{
    type <- match.arg(type)
    if (is.rooted(tree) == FALSE) 
        warning("A rooted phylogeny is required for meaningful output of this function", 
            call. = FALSE)
    if (scale == TRUE) {
        if (is.ultrametric(tree) == TRUE) 
            tree$edge.length <- tree$edge.length/(as.numeric(branching.times(tree)[1]))
        else tree$edge.length <- tree$edge.length/sum(tree$edge.length)
    }
    if (use.branch.lengths == FALSE) 
        tree$edge.length <- rep(1, length(tree$edge.length))
    for (i in 1:length(tree$tip.label)) {
        spp <- tree$tip.label[i]
        nodes <- .get.nodes(tree, spp)
        nodes <- nodes[1:(length(nodes) - 1)]
        internal.brlen <- tree$edge.length[which(tree$edge[, 
            2] %in% nodes)]
        if (length(internal.brlen) != 0) {
            internal.brlen <- internal.brlen * switch(type, equal.splits = sort(rep(0.5, 
                length(internal.brlen))^c(1:length(internal.brlen))), 
                fair.proportion = {
                  for (j in 1:length(nodes)) {
                    sons <- .node.desc(tree, nodes[j])
                    n.descendents <- length(sons$tips)
                    if (j == 1) portion <- n.descendents else portion <- c(n.descendents, 
                      portion)
                  }
                  1/portion
                })
        }
        ED <- sum(internal.brlen, tree$edge.length[which.edge(tree, 
            spp)])
        if (i == 1) 
            w <- ED
        else w <- c(w, ED)
    }
    return(w)
}

```

Evolutionary distinctiveness is our basic measure of phylogenetic isolation. #This should likely be 'fair proportions' instead of 'equal.splits'.
```{r}
      library(phytools)
      library(FARM)
      # Calculate ED
      # Using equal.splits method, faster computation
     # Evolutionary_distinctiveness_i <- evol.distinct2(this_tree, type = "equal.splits")  
      
      # ED - Summed evolutionary distinctiveness
     # Evolutionary_distinctiveness_sum <- sum(Evolutionary_distinctiveness_i)
```


```{r}
#Evolutionary_distinctiveness_sum
```



We can run some standard summary statistics (mean and variance) on this ED measure. var(Ed) shows up close to VPD on the PCAs in the intro. Tucker(2016)
```{r}
      # mean(ED)
     # mean_Phylogenetic_isolation <- mean(Evolutionary_distinctiveness_i)
      
      # var(ED)
      #variance_Phylogenetic_isolation <- var(Evolutionary_distinctiveness_i)
```



```{r}
#mean_Phylogenetic_isolation
#variance_Phylogenetic_isolation
```

#### Mean evolutionary distinctiveness (divergence indices)
The divergence indices version for $ED$ is mean evolutionary distinctiveness, $MED$. The mean of evolutionary distinctiveness. Redding (2003) and Isaac (2007)
$$
MED = \dfrac
{\sum_{i} ED_{i}}
{S} \\
\ \\
with \\
\ \\
ED_{i} = \sum_{b \in B_{t_{i}}} \dfrac{L_{b}}{S_{b}}
$$
#### Entropy measure of evolutonary distinctiveness (regularity indices)
The regularity indices for $ED$/phylogenetic isolation are $H_{ED}$, $E_{ED}$, $var(ED)$, $H_{AED}$

$H_{ED}$, Entropy measure of evolutionary distinctiveness, is the shannon index applied to evolutionary distinctiveness values. Cadotte et al. (2010)
$$
H_{ED} = -\sum_{i=1}^{S}
((\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}})
* \ln (\dfrac{ED_{i}}{\sum_{i=1}^{S} ED_{i}}))
$$

$E_{ED}$, Equitability of evolutionary distinctiveness, is $H_{ED}$ controlled for species richness. Cadotte et al. (2010)

$$
E_{ED} = \dfrac{H_{ED}}{\ln(S)}
$$
$var(ED)$, Variance in evolutinoary distinctiveness, is the variance of species evolutionary distinctiveness. Tucker (2016)

$$
var(ED) = \dfrac{1}{S-1} *
\sum_{i=1}^{S}
(ED_{i}-\dfrac{\sum_{i=1}^{S} ED_{i}}{S})^2
$$
$H_{ED_{Ab}}$, Abundance-weighted version of $H_{ED}$. Cadotte et al. (2016)

$$
H_{ED_{Ab}} = -\sum_{i=1}^{S}
(\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}} *
\ln(\dfrac{n_{i}AED_{i}}{\sum_{i=1}^{S} n_{i}AED_{i}}))
$$


## Beta diversity
Beta-diversity indices
I. Richness indices (presence–absence data)
II. Divergence indices (using pairwise distances among species)
1. Presence/absence data
A.	Decomposition into , ,  diversities
B.	Direct dissimilarities
1.	Using all distances
2.	Using nearest distances
2. Abundance data
A.	Decomposition into , ,  diversities
B.	Direct dissimilarities
III. Parametric indices
1. Equivalent numbers
2. Entropy


## Tree topology
Tree topology is a measure of the shape of the overall tree. The tree can be lopsided side-to-side or front-to-back. 

Our most trusted index for the tippy vs trunky of a tree is the gamma index, $\gamma$.The index characterizes the distribution of branching events within the tree. Trees with γ < 0 have relatively longer branches towards the tips of the phylogeny (tippy trees), whereas trees with γ > 0 have relatively longer inter-nodal distances towards the root of the phylogeny (stemmy trees).  tk represents an ‘evolutionary period’ (limits are given by two speciation events) or equivalently an internode distance. Pybus and Harvey (2000)

$$
  \gamma = \dfrac
  {(\dfrac{1}{S-2}* \sum_{i=2}^{S-1} (\sum_{k=2}^{i} Kt_{k}))- \dfrac{1}{2} * \sum_{j=2}^{S} jt_{j}}
  {(\sum_{j=2}^{S} jt_{j}) * \sqrt{\dfrac{1}{12*(S-2)}}}
$$
```{r}
      library(phytools)
      
      ltts <- ltt(this_tree, gamma = TRUE, plot = FALSE)
```

```{r}
ltts
str(ltts)
```

```{r}
      lineages_through_time <- as.numeric(ltts[[1]])
      time_steps <- as.numeric(ltts[[2]])
      #extract Gamma index
      gamma <- ltts[[3]]
      gamma_p_value <- ltts[[4]]
```


```{r}
lineages_through_time 
time_steps 
gamma 
gamma_p_value 
```


There are two other regularly used metrics that include abundance measures. Note: we don't have abundance measures for D-place data. 

$IAC$, imbalance of abundance at the clade level, quantifies the relative deviation in the abundance distribution from a null case where individuals are evenly partitioned between clade splits. $v$ is the number of nodes in the phylogenetic tree.  $n_{i}$ is, as defined above, the abundance of species $i$ in the assemblage.  $\eta_{k}$ is the expected abundance species $i$ would have if the abundance was randomly split among lineages in the phylogenetic tree at each speciation event.  is the number of lineages originating at node $k$ in the set $s(k,root)$, which contains the nodes located on the path between node $k$ and the root of the phylogenetic tree. N is the total assemblage abundance. Cadotte et al. (2010)

$$
\dfrac{\sum_{i=1}^{S}  |n_{i} - \hat{n_{i}}|}
{v} \\
\ \\
where \\ 
\ \\
\hat{n_{i}} = \dfrac{N}{\prod_{K \in s(i, root)}\eta_{k}}
$$

$I_{c}$, the Colless index, is the sum of the absolute differences in species richness between sister-clades at each internal node. For fully resolved trees, each internal node defines two sister-clades. $S_{1k}$ is the number of species descending from the first clade defined by node k and $S_{2k}$ that of the second clade. $v$ is, as defined above, the number of nodes in the phylogenetic. Colless (1982)

$$
I_{c} = \sum_{k=1}^{v} |S_{1k} - S_{2k}|
$$




## Macroevolutionary rates

```{r}

#function name = bd, function input = tree of type 'phylo'
print(bd) 

```



```{r}
 ## Speciation vs extinction rates and Net diversification
     

      bds <- bd(this_tree)
      speciation_rate <- bds[1]
      extinction_rate <- bds[2]
      extinction_per_speciation <- bds[3]
      speciation_minus_extinction <- bds[4]
  
```


```{r}


      ## Speciation vs extinction rates and Net diversification dependent on trait
     # N.for.dom <- table(this_world[, 6])
  #    if(length(N.for.dom) == 2) {
        par.div.dep <- DivDep( mytree = this_tree, myWorld = this_world)
        trait_1_speciation <- par.div.dep[1]
        trait_2_speciation <- par.div.dep[2]
        trait_1_extinction <- par.div.dep[3]
        trait_2_extinction <- par.div.dep[4]
        transition_from_trait_1_to_2 <- par.div.dep[5]
        transition_from_trait_2_to_1 <- par.div.dep[6]
        transition_rate_ratio_1to2_over_2to1 <- transition_from_trait_1_to_2/transition_from_trait_2_to_1
      
```




```{r}
library(ROCR)
        ## Crown age per trait AUC and effect size
        tip.length <- this_tree$edge.length[this_tree$edge[, 2] %in% 1:Ntip(this_tree)]
        tip.length <- (tip.length - min(tip.length)) / (max(tip.length) - min(tip.length))
        this_trait <- this_world[match(this_tree$tip.label, this_world[, 8]), 6]
        tip.length.2 <- tip.length[this_trait == 2]
        tip.length.1 <- tip.length[this_trait == 1]
        model <- glm(as.factor(this_trait) ~ log(tip.length + 1),
                     family = "binomial")
        effect.size <- model$coefficients[2]
       # plot(y = this_trait - 1, x= log(tip.length))
        p <- predict(model, as.factor(this_trait), type = "resp")
       # points(y = p, x = log(tip.length), col = "red")
        pr <- prediction(p, as.factor(this_trait))
        auc.model <- performance(pr, measure = "auc")@y.values[[1]]

      
```



```{r}
  ## Phylogenetic signal (D)
        Phylogenetic_signal <- Dsig(mytree = this_tree, myWorld = this_world)
        
```


# Spatial Locations

```{r}

library(spdep)
 ## Spatial Analysis
        nbs0 <- knearneigh(as.matrix(this_world[, 2:3]), k = 7, longlat = TRUE)
        nbs <- knn2nb(nbs0, sym = TRUE) # 7 symmetric neighbors
        nbs.listw <- nb2listw(nbs)
        factors.nbs <- as.factor(ifelse(is.na(this_world[, 6]), 3, this_world[, 6]))
        spatial.tests <- joincount.test(fx = factors.nbs, listw = nbs.listw)
        spatial.tests.fora <- spatial.tests[[1]]$statistic
        spatial.tests.dom <- spatial.tests[[2]]$statistic
        #prevalence <- (N.for.dom[1] - N.for.dom[2]) / sum(N.for.dom)
```


```{r}
results_summary_matrix_1 <- cbind(

        number_of_branches,
        #Pylo_diversity_is_sum_of_BL,
        #average_phylogenetic_diversity_is_mean_of_BL,
        #variance_Pylo_diversity_is_variance_of_BL,

        F_quadratic_entropy_is_sum_of_PD,
        Mean_pairwise_distance,
        variance_pairwise_distance,

        #Evolutionary_distinctiveness_sum,
        #mean_Phylogenetic_isolation,
        #variance_Phylogenetic_isolation,

        gamma,
        gamma_p_value,
        speciation_rate,
        extinction_rate,
        extinction_per_speciation,
        speciation_minus_extinction,
        trait_1_speciation,
        trait_2_speciation ,
        trait_1_extinction ,
        trait_2_extinction ,
        transition_from_trait_1_to_2 ,
        transition_from_trait_2_to_1 ,
        transition_rate_ratio_1to2_over_2to1 ,
        Phylogenetic_signal,
        spatial.tests.fora,
        spatial.tests.dom,
       # prevalence,
       # auc.model,
        effect.size
      )
      #rownames(results_summary_matrix_1) <- 1

      #results_summary_matrix_2 <- cbind(
      #  c(Evolutionary_distinctiveness,NA),
      #  lineages_through_time,
      #  time_steps
      #)
      #colnames(results_summary_matrix_2) <- c("Evolutionary_distinctiveness", "lineages_through_time", "time_steps")
      #head(results_summary_matrix_2)

      ### Returns from function in list form
      #returns <- list(
        #Branch_Lengths,
        #Pairwise_dist,
      #  results_summary_matrix_1,
      #  results_summary_matrix_2

      #)

      #names(returns) <- c(
        #"Branch_Lengths",
        #"Pairwise_distance",
       # "results_summary_of_single_value_outputs",
       # "results_summary_matrix_of_multi_value_outputs"
      #)
      
      
```



## Module2() returns these two matrices as a list 



```{r, echo=FALSE}
#kable(returns$results_summary_of_single_value_outputs, caption= "This is our world")
     
```


```{r, echo=FALSE}
 #kable(returns$results_summary_matrix_of_multi_value_outputs, caption= "This is our world")
```




<!--chapter:end:Tree_and_space_analysis_notebook_2.Rmd-->

---
title: 'D-place FARM documentation: Running simulations on the cluster'
author: "Ty Tuff, Bruno Vilela, and Carlos Botero"
date: 'project began: 15 May 2016, document updated: `r strftime(Sys.time(), 
format  = "%d %B %Y")`'
output:
  html_notebook: default
  html_document: default
  pdf_document: default
  word_document: default
bibliography: FARM package.bib
---

Running an R script on the cluster requires two parts: an R script with the code
to be run and a PBS script to control how that R script is run on the cluster.


There are two different clusters at Wustl.edu, an old and a new cluster. This
script runs the R code on the new cluster. 


```{bash}
#!/bin/bash
#PBS -N Four_model_run 
#PBS -V 					
#PBS -l walltime=23:59:00				
#PBS -l pmem=1200mb 
#PBS -l nodes=1:ppn=1:haswell 
#PBS -t 1-1000


echo $PBS_ARRAYID

cd /home/cbotero/mydirectory/Four_model_compare
module load R

export R_LIBS=$HOME/rlibs
#R CMD INSTALL --library=/home/ttuff/rlibs  FARM_1.0.tar.gz

Rscript --vanilla ./FARM_four_model_compare.R ${PBS_ARRAYID}
```


We were regularly causing problems running to many jobs on the new cluster and 
we were asked to move to the old cluster. This cluster has slower individual 
processors, but we can run more jobs at one time, so productivity has stayed 
about the same. 

```{bash}
#!/bin/bash
#PBS -N FARM_third_run_old 
#PBS -V 					
#PBS -l walltime=160:00:00				
#PBS -l pmem=1200mb 
#PBS -q old
#PBS -l nodes=1:ppn=1:nehalem 
#PBS -t 1-500


echo $PBS_ARRAYID

cd /home/ttuff/mydirectory/Four_model_compare
module load R

export R_LIBS=$HOME/rlibs
#R CMD INSTALL --library=/home/ttuff/rlibs  FARM_1.0.tar.gz

Rscript --vanilla ./FARM_four_model_compare.R ${PBS_ARRAYID} 

```


The final argument in the #PBS script above (#PBS -t 1-500) controls the serial 
running schema for running many simultanious instances of the R script at a 
time. This argument is passes to R as an integer value using the following
arguements inside R. 
```{r}

args <- commandArgs(trailingOnly = FALSE) #7 elements are passed from the PBS

NAI <- as.numeric(args[7]) # the seventh of those elements is the array integer.

```


Here is a working example of how to set up the R script.
```{r}
#install.packages("rfoaas")
library(rfoaas)

##If the PBS script started this code running and passed the number 13
##to this particular run of a larger serial set. 


#args <- commandArgs(trailingOnly = FALSE) 

NAI <- 13 #as.numeric(args[7])

	sayHello <- function(loop_number){
   	  print(paste0("I can count to ", loop_number, "!   ", cool(from="Ty")))
	}

	sayHello(NAI)

```


You logon to the cluster using linux/unix code from the command line 
terminal on you computer. Open the terminal and put in you login info.


```{bash}
ssh -Y ttuff@login.chpc.wustl.edu
password:_______
```

Upon first login, you will be in a folder called 'HOME' with a series of system 
files in it. You will want to use an FTP client to view and organize these 
files. I prefer Filezilla, but there are several other good clients available 
for free. Download filezilla, make sure it's in your applications folder, and
open it. You should see a window that looks like a newer version of this. The 
left panels are the files on the computer you're working from and the right
two panels will show the files on the server once you log in through Filezilla
also. 

![A fresh Filezilla window](Filezilla.png)

![Start a new server link](site manager.png)




![Start a new site and name that new site](New site name.png)


![Select a secure ssh file transfer protocol](SFTP.png)


![Login as normal](Choose normal.png)


![Enter password](Enter password.png)


![You need to create two new directories (folders) for us to work out of. Call one 'rlibs' and the other 'mydirectory'](Create two files within your home file.png)


![Within mydirectory, create another folder called 'Four_model_compare'](Create Four_models_compare folder.png)

![Within 'Four_model_compare', create two folders called 'Module_1_outputs' and 'Module_2_outputs'. Then drag three files from your computer files on the left to the server folders on the right: one .R file, one .pbs file, and a zip file with the package FARM in it. These should be named FARM_four_model_compare.R, FARM_four_model_compare_old_cluster.pbs, and FARM_1.0.tar.gz .](Create module folders.png)




Once logged in, you need to change the directory
```{bash}
cd /home/ttuff/mydirectory/Four_model_compare
```













![](Files from module 1.png)

![](Files from module 2.png)

















<!--chapter:end:Using_the_cluster_markdown.Rmd-->

